Version Control

Writing a Data Management Plan Organizing Version Control

Introduction

It is important to identify and distinguish versions of your research data files consistently. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions when needed. When establishing a strategy for version control you can use the following tips:

Use the date in the filename.
Use ordinal numbers (1,2,3 etc.) for major version changes and decimals for minor ones, eg, v1.1, v2.6;
Delete all versions with minor changes (which you already identified as less important) at set times, and other obsolete versions;
Beware of using confusing labels. Labels such as revision, final, final2, definitive_copy are ambiguous as you may find that these accumulate during your research;
Keep your files in one place only and make sure you stick to that. Consider copies at other places as ‘not current’;
Use an auto-backup facility (if available) rather than saving or archiving multiple versions;
Use version control facilities within software (e.g. MS Office);
Turn on versioning or tracking in collaborative documents or storage utilities such as Wikis, Teams, Onedrive, Google Docs, etc.;

Version control for your Research Data, Software and Scripts

This part mainly draws on Jiménez (2017).

Organising everything with version control is a very clever idea, throughout your research. And interesting and indispensable as these version control platforms are, you may not want to use them to control your raw research data, for instance; multi-gigabyte files are likely too large for these platforms anyway. That being said, your processed research data is an entirely different matter indeed.

Version control needn’t be restricted to just your research data. If, for instance, in your research you also develop custom scripts or software, then make sure you clearly document your code, document your project and put it on version control platforms just like you would with your research data. Mind, organising your code on these platforms does not necessarily mean that you are putting them on display for the entire world to see; you control access rights, which means you decide who gets to see your code, and who doesn’t.

Putting it on version control platforms just makes a lot of sense, not only for the obvious benefits of having several versions to roll back to if need be. While according to Fogel (2005), the longer a project is run in a closed manner, the harder it is to open it later, we do understand that many people may not be inclined to disclose their source code already from day 1 -this is, after all, a work in progress. At some stage, however, you may already feel a little bit more comfortable releasing your code. This, too, is a very simple story if you have your files on, say, Github: change the access settings of your repository, and you’re done. Open source in matter of seconds.

Opening code and exposing the software development life cycle publicly:

Promotes trust in the software and broader project
Facilitates the discovery of existing software development projects
Provides a historical public record of contributions from the start of the project and helps to track recognition
Encourages contributions from the community
Increases opportunities for collaboration and reuse
Exposes work for community evaluation, suggestions and validation
Increases transparency through community scrutiny
Encourages developers to think about and showcase good coding practices
Facilitates reproducibility of scientific results generated by all prior versions of the software
Encourages developers to provide documentation, including a detailed user manual and clear in-code comments

Tools

Use version control software such as Git, Github, TortoiseSVN, Subversion.

If you want to know all there is to know about Git and Github, then why not have a look at our introduction to Git & Github?

Previous Topic

Back to Lesson

Next Topic

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.