Contents:

  1. The results of the VIF user requirements study
  2. Previous work on versioning

The results of the VIF user requirements study:

The following is summarised from the VIF survey report.

The project undertook two surveys in the Autumn of 2007 of over 100 information professionals working in repositories and 60 academics (mainly in the UK with some international responses) producing some clear results. 

The results highlighted some striking realities about version identification; it seems that it is an accepted problem, although one that people are not particularly engaged with.

Both academics and information professionals agreed that identifying versions of research is a problem - only 5% of academics and 6.5% of information professionals surveyed found it easy to identify versions of digital objects within institutional repositories. The situation becomes even worse across multiple repositories (1.8% and 1.1% respectively).  However, approximately a third of Information Professionals who work with repositories stated that they either have no system currently in place or 'don't know' how they deal with versioning at present.  Moreover the team found the academic community to be largely disengaged from the versioning issue; not only was it difficult to find interested parties willing to complete the survey, a large proportion of those that made an attempt failed to return a finished survey, often pointing out that they were not expert enough or had not considered the issues enough to comment.

The two groups did diverge on the perceived purpose of repositories. The academics we surveyed were very clear about their wish to only make the finished version of their output ultimately available and free text comments (often even in answers to questions on different subjects) showed that they considered repositories were useful to highlight latest research, but not necessarily to preserve the body of research. This contrasts directly with the wishes of information professionals, who overwhelmingly wanted to store all available versions.

There is an awareness by information professionals of a trend towards a wider range of object types being created. When asked what types of material they currently stored in their repositories, 95.4% of information professionals claimed that they currently store, or plan to store, text documents with many also stating that they store, or plan to store, audio files (73.6%), datasets (77.9%), images (83.3%), learning objects (46.5%) and video files (75.3%).  This can be seen to be especially positive, especially in the context of the results of the academics survey, which suggested a large number of researchers either already create or intend to create audio files (47.2%), datasets (68%), images (72.5%), learning objects (74.6%) and video files (57.6%). As expected, the vast majority also intend to continue working with text documents.

In considering proposed solutions to versioning issues (for details see the full report), support was high for all of the suggestions (only date stamping had less than 60% support across the respondents) and no one solution stood out as having more support than any other. However, we encouraged free text responses with each suggested solution, and comments revealed numerous and varied misgivings, with many qualifications and caveats being given.

The most contentious example is that of using a taxonomy to define the version status. The notion is clearly attractive, and has been addressed previously with NISO / ALPSP, RIVER and VERSIONS (see the Background page of the VIF website) all offering a possible vocabulary to describe versions. Many free text comments remarked that whilst the idea is a sound one in principle, implementing such a taxonomy would be virtually impossible without some sort of enforcing body. Also, getting community agreement on the terminology used would be difficult due to the often polarised standpoints of publishers and information professionals. Insulating the vocabulary chosen from the pre-established terminology and bias of certain camps would clearly be a very serious undertaking.

The lack of a silver bullet to solve the version identification problem was ultimately of no surprise, considering different needs across disciplines and of different types of object. This was a blessing and curse for developing solutions for the framework, as very little had been eliminated and the options left available are complex and not applicable generically. Therefore, we chose to detail many solutions, with their benefits and problems made explicit, and allow the audience choose the solutions that suit their needs best.

Further research on Data sets in Repositories:

VIF carried out further research into repositories that already contain some datasets, and investigated how these datasets are managed. Because this is a currently limited field, and because repository systems are not primarily configured to deal with such objects, we found that repository staff:

Previous research on versioning:

Of most prominence are the recent VERSIONS project (Versions of Eprints - a user Requirements Study and Investigation Of the Need for Standards) and the RIVER (Scoping Study on Repository Version Identification) study which both examined aspects of version identification in the light of the open access movement.

VERSIONS, led by LSE (London School of Economics and Political Science), recently found that 59% of researchers produce four or more types of research output from each research project. Types discussed were articles, book chapters, working papers, conference papers, and presentations amongst others. The VERSIONS project focussed on e-prints in Economics, and found that not only do researchers output these different types of object, but also that each one of these will be developed through several draft versions. These draft versions are increasingly likely to be made available as working papers or simply as they are during the development of a piece of research, and VERSIONS found that this was leading to much confusion for end users who were trying to identify the work they were finding online.

Previous to the VERSIONS project, the RIVER study concluded that:
'The issue of version identification is not simply (indeed not primarily) one of unique identification of resources but rather of defining the relationship between resources. While it is important that each of those resources should be uniquely referenceable, from a user standpoint the more significant questions to ask are:

The VIF project moved on in two significant ways from this prior research. Firstly, we have looked at the requirements for a variety of digital objects, not just text documents, and across the whole range of disciplines. Secondly, the VIF team have described our approach to versions as agnostic. The research carried out so far has a quite different approach, with much discussion already having taken place on defining and using labels to describe work in the context of the publication process.

VIF therefore defines a version as follows:

A 'version' is a digital object (in whatever format) that exists in time and place and has a context that can be
described by the relationship it has to one or more other objects. 

A ‘version relationship’ is an understanding or expression of how two or more objects relate to each other.

 This definition accounts for ultimately user defined opinions about what constitutes a version, and underpins the project team's desire to remain neutral and produce a clear and transparent framework. Our core aim with the framework is to make important information, which would allow for clear version identification to take place, available to end users and repository managers.

Follow-up:

Go straight to the framework.
 

Last updated 08/5/08 | Copyright © 2008 LSE