Git versus Subversion
A Comparison of Version Control Systems

2009-08-05

Overview

Here at Fermilab we have been using CVS for version control on projects for nearly 20 years. There is interest in supporting a newer version control system for projects starting up, in addition to continuing to support CVS for existing customers. The two leading candidates are Git and Subversion. This document presents a comparison of these two.

Scope

This document is a comparison of version control packages. Larger issues such as software design and implementation methodology are beyond the scope of this document, except where interfaces to particular tools are required, or the ability to call scripts as a side effect of version control actions is needed. The packages under consideration here are equivalent in those respects to the first order: they both support calling scripts on commits, etc. to allow various checks to be made, or notifications to be sent out, to help enforce various policies.

Subversion Overview

Subversion is is a central, networked repository version control system. A single repository needs to be contacted for nearly all operations, such as history, committing changes, etc. Branching and tagging are really shallow-copy operations in the repository - a branch or tag makes a copy of the repository, which actually has a different repository name. Subversion's past awkwardness with merging branches is improved in recent versions of Subversion. Subversion has far better support for non-plain-text files than CVS, and for renaming files. On a given branch, any revision is given a unique integer number in Subversion, however the same revision number on different branches is often a different revision.

Git Overview

Git is a distributed networked repository version control system. Each remote user maintains a copy of the repository, and can commit changes, get history, etc. locally. Separate operations sync up your local repository with other ones, through various means. This facilitates: Branching and merging in Git is much smoother and simpler than in CVS, and is logged as branching and merging, not just as copies, and changes merged in from the branch are still attributed to the person making the change, not the person doing the merge (as is done in CVS and Subversion). In Git, any revision is given a unique 40-byte key, rather than a branch-specific number, the 40-byte key for a revision is unique across all the repository copies and branches, so it is Real Name for the revision.

Commonalities/Non-Issues

There are numerous features which are important that the packages do not differ on. This section summarizes features or attributes common to both Git and Subversion which are therefore not issues for deciding between them, but are summarized here lest readers think them overlooked, and also because some of them are recent developments for Git, and older reviews would lead one to conclude they were not there.

Basic Functionality

Both packages support:

Package support

Both packages have active open-source communities who help users on mailing lists and via IRC chats, and have ongoing code development and maintenance. Both are currently putting out bugfix releases about every 2 months.

Tool support

There exist plug-ins for both Git and Subversion for IDE's and Bug trackers like: although, of course, the Subversion ones are more stable, being older.

Both packages have a Tortoise wrapper GUI for Windows, and various Unix GUI implementations. Again, the subversion ones are older and more stable.

Remote connectivity

Both packages support

External Hosting

External sites provide hosting for repositories:

Basis for comparison/recommendation

There are several areas where we could use improvement in our use of version control at Fermilab, and which justify moving to a new version control system at all:
  1. Compatability -- You can use Git with Subversion repositories, (via the git-svn backend) but Subversion does not support Git repositories.
  2. Commit-reluctance -- Many experiments groups do a nightly/weekly build from the head of the repository, which makes developers reluctant to commit changes until they're "done". The tool for avoiding this problem is branching, where features are developed on a branch and merged back into the main repository. So making branching and merging straightforward is a key feature.
  3. Distributed-ownership -- Many experiments use packages developed and maintained by other groups, (i.e. GEANT, GENIE, CERNLIB, Root) yet they need to keep and track local changes, and feed some of them back to those other groups. So tracking third party packages and merging changes back and forth is another key feature.
  4. Speed -- use of remote repositories can be very slow, especially for collaborators in network-remote locations relative to Fermilab (Brazil, etc.)
For commit-reluctance, Git has the easiest to use, fastest, and smartest branch/merge interface, even given Subversion's recent improvements in that area. Git is also better for distributed ownership, in particular maintaining information about who made what changes when merging branches between different repositories. It allows people to even email patches back and forth and maintain such information easily. And Git's local repository copy makes most operations fast when you are in a remote location, or even disconnected from the network entirely, and you can then push your changes back into a central repository when you can get network connectivity.

Other concerns

There are several other concerns that have been raised:

But CERN uses Subversion!

Folks here can use git-svn to track external svn repositories (i.e. at CERN). And several groups at CERN, (at least the CDS support group) are leaning towards Git (see References). CERN made their decision about a year ago, when Git support was not as broad as it is now. They also listed Fermilab's (nonexistent) support for Subversion as a reason for picking it. So it is quite possible that a future review at CERN will pick Git as the next version control package.

But Sourceforge uses CVS and Subversion...

...and also git

Are changes backed up?

While we would hope that users would push changes to a central repository at about the same rate that they commit code currently with CVS, we might want to make suitable recommendations on pushing changes from local repositories back into central repositories on branches, or for making repository copies on thumb drives, etc., to to users to keep suitable backups of local changes.

Drawbacks of Git

There are drawbacks to Git. It should be noted that all of these drawbacks involve start-up effort to switch to using and supporting Git, rather than ongoing effort.

Recommendation

In short, I recommend moving to Git. We have already discussed how Git addresses the most important issues of commit-reluctance, distributed-ownership, and speed, through superior branch handling, and use of distributed repositories. Git also has other features, like "git bisect" (which lets you binary-search for a revision which caused a problem) which are just plain useful, and which Subversion does not have. The drawbacks of Git versus Subversion involve small conversion costs at startup, versus the ongoing benefits of better branch management and speed for end-users. On this basis, Git seems the superior choice.

References

Videos: Other comparisons: Manuals: Specific Git documentation: Tortoise (Windows) front ends: Plug-ins for tools: