Converting a 5-year-old repository from Subversion to Git

git-svn

We’ve been using Subversion for DoneDone’s source code hosting since 2009, and it’s served us well over the years. Lately, though, our coworkers at We Are Mammoth and Kin have been using Git (and GitHub) with great results. We started to envy how quickly they were able to create and switch branches, submit pull requests, and manage their merges. We wanted to see if switching to a Git-based workflow would help us streamline our development process, so we decided to try converting DoneDone’s SVN repo to Git.

The plan

DoneDone’s codebase is over 5 years old, and its SVN repo includes over 4,500 commits, 120 tagged releases, and a handful of development branches. Our goal for the conversion was to preserve as much as possible – ideally, we wanted to convert every SVN commit, branch, and tag to its Git equivalent. This would allow us to seamlessly transition from the old repository to the new one.

Conversion options

GitHub’s article on Importing from Subversion offers two approaches: using their web-based importer tool, or using a Ruby utility to perform the conversion locally. Unfortunately, the web tool was not an option for DoneDone, as our repository is private (and quite large). So, we took a closer look at the recommended Ruby utility: svn2git.

Installing svn2git

svn2git is a Ruby program that uses Git’s built-in svn command to import data from a Subversion repository. The disadvantage of the git svn command is that branches and tags are created as remote branches. By comparison, svn2git combs through every commit in your trunk, branches, and tags and recreates your repository locally using Git’s master/branches/tags structure. Since this is exactly what we wanted to accomplish, svn2git was a perfect fit.

To use svn2git, we needed to setup our local machine with some necessary utilities:

Once our environment was setup, the next step was to install the svn2git gem:

gem install svn2git

Starting the conversion

Once installed, we created a local directory for our new Git repository, then ran the svn2git command within it, including our SVN repository’s URL and a login username:

mkdir donedone-git-repo
cd donedone-git-repo
svn2git https://svn.example.com/path/to/repo --username jeremy

It’s important to note that the /path/to/repo portion of the SVN URL should point to the overall root of your repository – this should be one level up from your trunk.

At this point the utility began rolling right along, dutifully requesting data from the Subversion repository. But then it hit a snag…

Mapping author names and emails

SVN authors only have one attribute: username. In contrast, Git authors will have a name and email address. svn2git can handle this conversion with a simple text file that maps each SVN author to a name and email:

jeremy = Jeremy Kratz <jeremy@example.com>
kawai = Ka Wai Cheung <kawai@example.com>

Since several developers have contributed to DoneDone over the years, we needed to compile a list of every author in the SVN repo. The svn2git documentation includes a nice command to do just that using the svn log command:

svn log --quiet https://svn.example.com/path/to/repo | grep -E "r[0-9]+ \| .+ \|" | cut -d'|' -f2 | sed 's/^ //' | sort | uniq

Note: If you’re using an SVN tool like TortoiseSVN, you’ll need to make sure that you have SVN’s command line tools installed locally. And since commands like grep and sort aren’t available by default on Windows, Windows users will need to install Cygwin and run the above command in a Cygwin window.

This command will output a list of authors from the SVN repo – simply copy and paste this into a text file, and re-run the svn2git command with the authors option:

svn2git https://svn.example.com/path/to/repo --username jeremy --authors authors.txt

Now we were all set, and svn2git happily began running without any problems.

Our shiny new repository

After about 7 hours, svn2git finished working its magic and we had a new Git repository that was essentially identical to the SVN original. Our final step was to push the local repository to GitHub, and verify that our branches, tags, and commits were preserved.

GitHub graphs

Success! DoneDone’s complete commit history was preserved, along with all branches and tags.

The next day, we were able to simply clone the new Git repository to our development machines, and immediately begin working where we had left off when using Subversion.

So what differences have we noticed from using Git over the past few weeks, compared with our experiences with SVN?

  • Branching is faster: Since you have the entire Git repository on your local machine, creating branches is much faster than the Subversion process of creating a new branch on the server, then checking it out locally. Where we were previously creating large branches with multiple features in Subversion, we’re now creating tiny branches for each update, since the process is so trivial.
  • Branching is better integrated with DoneDone: Because branching is now faster and easier, we’re better able to follow the best practice of creating a new branch for each feature/bug. Since we eat our own dog food, we use DoneDone to track our issues, and we’ve found the following naming conventions help us keep our issues in sync with our repo:
    • bug_dd_### – Branch contains a bug fix for a specific DoneDone issue (### is the issue number)
    • feature_dd_### – Branch contains a new feature described by a DoneDone issue (### is the issue number)
  • Pull requests help us keep better track of merges: Merging branches in SVN could be a pain, as a developer would need to merge changes locally, resolve any conflicts, review the code, and then commit. GitHub’s pull request feature allows you to quickly compare the modified files, then automatically perform the merge (in some cases) and delete the branch. This, combined with easier branching, has made our branch/merge process seem a lot more streamlined.
  • We can commit code locally: When your SVN server goes down, or when you don’t have network access, work can easily grind to a halt since you want to avoid making one huge commit when the repository is back online. With Git we’re able to simply commit locally, then push all our changes when the remote host is available.
  • We can analyze our repository: GitHub includes some great tools for analyzing your codebase. We can now easily see graphs of commits, additions, deletions, branch/merge activity, and more. It’s nice to have these types of visualizations, instead of text-based log entries.

Futher reading

Jeremy Kratz is a developer at DoneDone. Follow him on Twitter via @jwkratz.