DRAFT
Git vs Subversion
A comparison of version control systems
DRAFT
Overview
Here at Fermilab we have been using CVS for version control on projects for nearly
20 years. There is interest in supporting a newer version control system for
projects starting up, in addition to continuing to support CVS for existing
customers. The two leading candidates are Git and Subversion.
This document presents a comparison of these two.
Scope
This document is a comparison of version control packages. Larger issues such
as software design and implementation methodology are beyond the scope of this
document, except where interfaces to particular tools are required, or the ability
to call scripts as a side effect of version control actions is needed. The packages
under consideration here are equivalent in those respects to the first order: they both support calling scripts on commits, etc. to allow various checks to be made, or notifications to be sent out, to help enforce various policies.
Subversion Overview
Subversion is is a central, networked repository version control system. A single
repository needs to be contacted for nearly all operations, such as history,
committing changes, etc. Branching and tagging are really shallow-copy operations in the
repository - a branch or tag makes a copy of the repository, which actually has a
different repository name.
Subversion's past awkwardness with merging branches is improved in recent
versions of Subversion.
Subversion has far better support for non-plain-text files than CVS, and for renaming files.
On a given branch, any revision is given a unique integer number in Subversion, however
the same revision number on different branches is often a different revision.
Git Overview
Git is a distributed networked repository version control system. Each remote user
maintains a copy of the repository, and can commit changes, get history, etc. locally.
Separate operations sync up your local repository with other ones, through
various means. This facilitates,
- working offline from the network, and
- putting changes into the central repository only after they are tested,
while maintaining finer grained version control locally.
- folks who do not have write permission on the central repository
can make and track changes using their local repository.
They can then easily submit them as patches via email, etc.
Those changes can be rolled into another repository by someone who has
write access, maintaining authorship information.
Branching and merging in Git is much smoother and simpler than in CVS, and is logged
as branching and merging, not just as copies, and changes merged in from the branch
are still attributed to the person making the change, not the person doing the merge (as is done in CVS and Subversion).
In Git, any revision is given a unique 40-byte key, rather than a branch-specific number, the 40-byte key for a revision is unique across all the repository copies and branches, so it is Real Name for the revision.
We would need to make suitable recommendations on pushing changes from local repositories back into central ones as branches, or to copies on thumb drives, etc., to remind users to keep suitable backups of local changes.
Commonalities/Non-Issues
There are numerous features which are important that the packages do not
differ on.
This section summarizes features or attributes common to both Git and Subversion
which are therefore not issues for deciding between them, but are summarized here
lest readers think them overlooked, and also because some of them are
recent developments for Git, and older reviews would lead one to
conclude they were not there.
Basic Functionality
Both packages support:
- status listings of changes, etc.
- diff of current versus repository, other versions/branches, etc
- atomic commits
- whole-package versions
- adding, deleting, and renaming tracked files
- calling scripts ("hooks" or "info" scripts) at various points in the
version control process. (i.e. commitinfo checks in cvs) These scripts
are allowed to veto commits, etc. based on available information like
the user doing the commit, the log message, the code being committed, etc.
Tool support
There exist plug-ins for both Git and Subversion for IDE's and Bug
trackers like:
- Eclipse
- Maven
- Intellij
- Trac
although, of course, the Subversion ones are more stable, being older.
Both packages have a Tortoise wrapper GUI for Windows, and various Unix
GUI implementations. Again, the subversion ones are older and more stable.
Remote connectivity
Both packages support
- HTTP for read-only access
- WebDAV for read/write access (although Subversion needs an
extra apache plug-in)
- ssh gateway-ed access (including via kerberos with suitable ssh)
- direct connect to daemon (with weaker security, thus we would
probably not support it, except possibly read-only)
External Hosting
External sites provide hosting for repositories:
- SourceForge -- CVS, Subversion, Git
- GitHub -- Git
- Savannah -- CVS, Subversion, Git
Basis for comparison/recommendation
There are several areas where we could use improvement in our use of version control
at Fermilab, and which justify moving to a new version control system at all:
- Commit-reluctance -- Many experiments groups do a nightly/weekly build
from the head of the repository, which makes developers reluctant to commit
changes until they're "done". The tool for avoiding this problem is branching,
where features are developed on a branch and merged back into the main repository.
So making branching and merging straightforward is a key feature.
- Distributed-ownership -- Many experiments use packages developed and maintained
by other groups, (i.e. GEANT, GENIE, CERNLIB, Root) yet they need to keep and track local changes, and feed some of
them back to those other groups. So tracking third party packages and merging
changes back and forth is another key feature.
- Speed -- use of remote repositories can be very slow, especially for collaborators
in network-remote locations relative to Fermilab (Brazil, etc.)
For commit-reluctance, Git has the easiest to use, fastest, and smartest branch/merge interface,
even given Subversion's recent improvements in that area.
Git is also better for distributed ownership, in particular maintaining information about who
made what changes when merging branches between different repositories. It allows people to
even email patches back and forth and maintain such information easily.
And Git's local repository copy makes most operations fast when you are in a remote location, or
even disconnected from the network entirely, and you can then push your changes back into a
central repository when you can get network connectivity.
Other concerns
There are several other concerns that have been raised:
But CERN uses Subversion!
Folks here can use git-svn to track external svn repositories (i.e. at CERN).
And several groups at CERN, (at least the CDS support group) are leaning towards Git (see References).
CERN made their decision about a year ago, when Git support was not as broad as it is now.
They also listed Fermilab's (nonexistent) support for Subversion as a reason for picking it.
So it is quite possible that a future review at CERN will pick Git as the next version control
package.
But Sourceforge uses CVS and Subversion...
...and also git
Drawbacks of Git
There are drawbacks to Git.
It should be noted that all of these drawbacks involve start-up effort to switch
to using and supporting Git, rather than ongoing effort.
- Users used to CVS will have an easier time learning Subversion
than learning Git, as Subversion and CVS share a common
single-repository model.
This is mitigated by Git documentation targeted to folks familiar with CVS
(see References); as well as a CVS server emulation to
support CVS clients, for people who Really Don't Want To Learn a New
System, or for existing code that cannot easily be converted.
- If Git is our supported central repository mechanism, we should move
Git rpms into the Scientific Linux distributions, to keep up on patches.
(Subversion is already there due to the Upstream Vendor) However, this
makes sense anyway, since Git is used for Linux kernel development,
etc.
- Because Git does not support checking out sub-trees, The central service would need to be
setup for multiple per-package repositories for each project/experiment, rather than a project-wide
repository.
Recommendation
In short, I recommend moving to Git. We have already discussed how
Git addresses the most important issues of commit-reluctance, distributed-ownership, and speed, through
superior branch handling, and use of distributed repositories.
Git also has other features, like "git bisect" (which lets you binary-search for a revision which caused a problem) which are just plain useful, and which
Subversion does not have.
The drawbacks of Git versus Subversion involve small conversion costs at startup, versus
the ongoing benefits of better branch management and speed for end-users.
On this basis, Git seems the superior choice.
References
Videos:
Other comparisons:
Manuals:
Specific Git documentation:
Tortoise (Windows) front ends:
Plug-ins for tools: