Contents:
Why versioning matters:
Versioning is so inherent to the research process that it can be easy to give the matter little thought. However, a small amount of time spent now could save you a lot of effort later, provide better and easier to use repositories and save researchers a significant amount of time and confusion in searching for the right digital object.
Work is very seldom a linear process with clear easily identifiable outputs, all of which look and are formatted in the same way. One simple idea can spiral into a huge body of work, expressed in a range of different digital objects, involving several authors and held in a multitude of places.
Research Projects can have many potential outputs, each with multiple drafts or iterations. Each of these has potential variations, shared authorship, different file formats, editing and so on:

What VIF means by 'version':
A body of work is often highly dynamic and complex, spawning a vast number of separate entities, all of which may relate to each other in a loose way, but may not intuitively be understood as 'versions' of the same thing.
There is a difference between understanding what version an object is and understanding what the version relationship between two or more objects is. For example, upon finding a single object in a repository an end-user might ask:
- I want to cite the correct version, is this one it?
- Is this a draft version of something?
- Is this a complete version?
- Is this the presentation which was delivered in London or Cardiff?
These sorts of question are important for repository managers and apply every time someone looks at an item in a repository. Sometimes however there will be more complex questions to ask about the version status of an object, because sometimes it will be important to know what version relationships an object has with other objects:
- What order were these draft versions created in?
- Is this working paper related to the article of the same name?
- I have found 2 maps of Hertfordshire in a repository but how are they related? Are they from different times (modern day as opposed to C18th?), have they been drawn up by different cartographers?, and so on.
- Are conference papers and posters stored in the same place as the article that they are written about or relate to?
For example, 'Draft one' and 'Draft two' of a paper should be easy to identify as two versions of the same thing, but research is not always that linear. Would an C18th map of Hertfordshire and the most current OS map of Hertfordshire be versions of each other? Do they share enough attributes to be that closely related?
VIF has defined a version as follows:
A 'version' is a digital object (in whatever format) that
exists in time and place and has a context that can be
described by
the relationship it has to one or more other objects.
A ‘version relationship’ is an understanding or expression of how two or more objects relate to each other.
This definition encapsulates the notion that all objects associated with a piece work (perhaps a concept, a research project and so on) have a relationship with each other. These relationships vary but would be considered by many people to be versions of each other. The VIF project has avoided defining what is and what isn't a version and left it to the repository manager and the end user to decide what they would call versions. VIF has instead focussed on making sure that all the important information that a user would need to understand what version object they have found is made available to them.
Assumptions about versions made by the VIF project:
- A 'version' is a digital object (in whatever format) that exists in time and place and has a context within a body of work.
- A version is identifiable; the change between versions is describable and understood by either human and/or machine.
- The understanding of what a version is relates to either its content (e.g. a digital variant) or its format (e.g. a digital copy).
- Some versions can be perceived as more relevant or authorised and/or authentic than others by either author or reader.
-
Some versions are more appropriate to an end user than others.
What is required by the framework is more transparency; clarity
about versions should help a user understand which is the 'best
version' for their purposes.
Common versioning problems:
These are just a few of the common versioning problems:
- Confusion over whether an article is the published version, a copy that is identical in content to this but unformatted, a draft version, an edited version and so on.
- Repository searches yielding many results which ostensibly appear to refer to the same item, but actually vary in terms of content, formatting or propriety file type.
- Research work with multiple authors being deposited in different places at different stages of development without guidance as to which is authoritative or most recent.
- Multimedia items being handled poorly by repositories that treat them as text, and their relationship to other objects that form part of the research project being undefined by the repository.
- Vastly inconsistent approach of different repository software packages and implementations in how versions are dealt with.
A further list of issues associated with versioning can be found on the 2nd page of the Versioning Issues - Discussion Document, found here.
Follow-up:
Read more about research and work in this area.
Go straight to the Version Identification
Framework.