Introduction
1. Preparing a DMP
2. Documenting and Organizing Data
3. Storing Data and Data Security
1 of 2

Version Control

Introduction

It is important to identify and distinguish versions of your research data files consistently. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions when needed. When establishing a strategy for version control you can use the following tips:

  • Use the date in the filename.
  • Use ordinal numbers (1,2,3 etc.) for major version changes and decimals for minor ones, eg, v1.1, v2.6;
  • Delete all versions with minor changes (which you already identified as less important) at set times, and other obsolete versions;
  • Beware of using confusing labels. Labels such as revision, final, final2, definitive_copy are ambiguous as you may find that these accumulate during your research;
  • Keep your files in one place only and make sure you stick to that. Consider copies at other places as ‘not current’;
  • Use an auto-backup facility (if available) rather than saving or archiving multiple versions;
  • Use version control facilities within software (e.g. MS Office);
  • Turn on versioning or tracking in collaborative documents or storage utilities such as Wikis, Teams, Onedrive, Google Docs, etc.;

Version control for your Research Data, Software and Scripts

This part mainly draws on Jiménez (2017).

Organising everything with version control is a very clever idea, throughout your research. And interesting and indispensable as these version control platforms are, you may not want to use them to control your raw research data, for instance; multi-gigabyte files are likely too large for these platforms anyway. That being said, your processed research data is an entirely different matter indeed.

Version control needn’t be restricted to just your research data. If, for instance, in your research you also develop custom scripts or software, then make sure you clearly document your code, document your project and put it on version control platforms just like you would with your research data. Mind, organising your code on these platforms does not necessarily mean that you are putting them on display for the entire world to see; you control access rights, which means you decide who gets to see your code, and who doesn’t.

Putting it on version control platforms just makes a lot of sense, not only for the obvious benefits of having several versions to roll back to if need be. While according to Fogel (2005), the longer a project is run in a closed manner, the harder it is to open it later, we do understand that many people may not be inclined to disclose their source code already from day 1 -this is, after all, a work in progress. At some stage, however, you may already feel a little bit more comfortable releasing your code. This, too, is a very simple story if you have your files on, say, Github: change the access settings of your repository, and you’re done. Open source in matter of seconds.

Opening code and exposing the software development life cycle publicly:

  • Promotes trust in the software and broader project
  • Facilitates the discovery of existing software development projects
  • Provides a historical public record of contributions from the start of the project and helps to track recognition
  • Encourages contributions from the community
  • Increases opportunities for collaboration and reuse
  • Exposes work for community evaluation, suggestions and validation
  • Increases transparency through community scrutiny
  • Encourages developers to think about and showcase good coding practices
  • Facilitates reproducibility of scientific results generated by all prior versions of the software
  • Encourages developers to provide documentation, including a detailed user manual and clear in-code comments

Tools

Use version control software such as Git, Github, TortoiseSVN, Subversion.

If you want to know all there is to know about Git and Github, then why not have a look at our introduction to Git & Github?