Surprises while migrating from svn to git

At Spiria, we are currently migrating our repositories from svn to git. In the process, we have encountered a surprisingly large number of small issues, which I would like to share with you today. If you are planning to undergo a similar migration yourself, reading through this list of minor differences should help you plan your migration more thoroughly than we did.


The git-svn command ships with git, and will do most of the migration work for you by converting each svn commit into a corresponding git commit. I have had a pleasant experience using git-svn to interact with subversion servers, but this experience did not prepare me for the many quirks which surfaced during the migration.


First, there is the issue of usernames. When using git-svn to interact with a remote subversion server, you are using two identities: your subversion username, and your git username. git-svn knows that those two identities refer to the same person. When performing a migration, however, you also need to consider the usernames of everybody else who has ever touched the svn repository; that may include employees who have left long before you came in and whose names you have never heard, as well as system processes which don't have a proper email address like git expects. Expect to spend a while maintaining an authors file and constantly restarting your migration scripts because of missing or incorrectly-formatted entries. If you plan to have a transition period during which early-adopters push to git while the majority of your team continues to commit to svn, there are a few extra difficulties to expect. Sure, git-svn can convert the new svn commits into git commits and vice-versa, but there are now two complications which do not occur when you are the only person using your git-svn repository. First, you probably don't have permission to impersonate other svn users, so all the git commits you convert back to svn will have your name on it. Second, when git-svn converts those git commits into svn commits, it rewrites the local history so that all commits are annotated with their "git-svn-id" line. If you decide to force-push this rewritten history to your shared git repository, remember that your early adopters are new to git and have no idea how to deal with rewritten public histories.


Second, there is the issue of deleted branches. When you delete a branch in svn, you have an explicit commit in which the directory corresponding to the branch is being removed. You can always come back to the deleted branch by obtaining a revision pre-dating the deletion. Not so in git. In git, branches are typically merged into master before being deleted. Otherwise, once a git branch is deleted, its commits become unreachable and become eligible for garbage collection. To prevent you from losing data in this fashion, branches which get deleted in svn do not get deleted by git-svn, and as a result, you will have to decide what to do with the various branches which got resurrected in the migration. For example, if a branch was renamed, you will get a branch with the old name and a branch with the new name, and you can safely delete the old name because its history is included in the new branch. Not only that, but among those resurrected branches (which, again, if they were deleted before you came in, you might have never heard of), there might also be some strange branches of the form "branchname@2000--". I read that those branches contain the folders from which your svn branch were copied, when the corresponding branch folder was first created. I have found that the HEAD of those spurious branches usually occur in the history of some of your other branches, so it's probably safe to delete them. The final hurdle related to svn branches is that git-svn creates branches named "tags/tagname" instead of creating a proper git tag. Also, all the branches which git-svn creates are remote-tracking branches, which you will have to convert to local branches before pushing them to your git server. It's easy to create a script which automatically converts each of those branches and also converts each of the tag branches into a tag, but it's also easy to forget this step and losing all the branches except trunk.


While git-svn will automatically migrate your files, commits, and branches, it will silently drop all of your svn:properties. Especially relevant are svn:ignore, svn:eol-style and svn:externals, in order of resolution difficulty. I won't go into the details, but the corresponding git keywords are: .gitignore, .gitattributes, and submodules. Keep in mind, however, that svn:externals can be used for many different purposes including as a crutch to emulate cross-platform symlinks, while submodules are only for including a particular version of an external repository into yours. If your needs are different, you might need to investigate other solutions. In our case, we decided to give more responsibilities to our build system.


While git-svn makes it look like migrating from svn to git would be a breeze, git and svn have many subtle differences which need to be taken into account during a migration. We have had to face all of the above issues and more, but in the end, it was worth it! Just be sure to educate your colleagues about the advantages of git over subversion. If you try to alleviate fears of change by assuring them that you can use git in basically the same way as they were using svn, they might not understand the purpose of migrating at all.