4

I have been working on converting an SVN repo of ~32,000 commits to any DVCS (Git, Bazaar, Mercurial, Plastic SCM). After a week or two I realized the best option is to convert the SVN repo to Git, get a fast-export stream, and import the .fe stream to whatever DVCS, as they all support the git fast export/import method.

I've tried everything on the internet: both on Windows 7 and Linux Ubuntu. Due to the size of the repo, I've had most success using reposurgeon and git-svn. But again, due to the size, both tools fail to covert the full repo in one go. I also tried SubGit, and although it works, it is extremely slow (~24h to process 1060 commits).

So I figured I could convert each folder within the repo (trunk, branches, tags, custom folders) separately and combine later on in Git. Then I realized this would not be possible as git's repo structure is significantly different to SVN.

My question is, is it possible to use my method above and with some magic, combine the separate conversions into one Git repo?

Essentially I need to get a fast export/import stream for my SVN repo to convert it to another DVCS, and figured a Git middle-step would be easiest. What, if any, other options are available for a successful conversion?

Thanks in advance.

Opender Singh
  • 305
  • 2
  • 10
  • 1
    Do you really need all these 32000 commits for maintaining the current version of your software. Cutting off every commit older than for example one year might help reducing the amount of data that needs to be handled, without really loosing any relevant data. Usually you do not care WHO committed that set of lines two years ago that has a bug - you care about "those lines are really old and buggy, let's fix and refactor". – Sven Jan 07 '14 at 22:18
  • Cheers @Sven . I agree with you; all 32000 commits are not necessary but the more the better as we want to potentially leave SVN altogether. Still with that in mind, what can I do to actually have a successful conversion? What are the options? – Opender Singh Jan 08 '14 at 02:43

2 Answers2

4

Converting folders separately and combining the git repositories should work in principle, but would be very tricky to get right, so I'd advise against it.

At any rate, 32,000 commits is not that much, and git-svn should be able to handle it, though it might take a day or so. However, if it is too slow, you'll have to experiment a bit.

Things that can slow down git-svn's clone operation

SVN repository speed

First, of course, is the SVN repository speed. Try creating a local mirror of the SVN repository (using svnadmin dump/load or svnsync), and clone that.

"Subdirectory" branches/tags

Branches or tags (which git treats identically) can become a problem. Whenever git-svn clone encounters an SVN branch that is not a copy of trunk, but of a subdirectory, it will re-read the whole SVN history of the branched subdirectory since its creation (you can see this in the output of git svn clone, and here is an explanation by the author). This means that the speed of a clone is not only proportional to the number of SVN revisions n, but also to the number of "subdirectory branches" b, i.e. if b = 10, the clone may take up to 10 times longer.

There is no easy solution to this problem. First, you could try cloning without tags - normally a tag just revers to an SVN revision ID, so having a list of tags should be enough (unless you have tags that contain changes... ugh). If that's not enough, maybe also skip some branches... though'd you'll have to decide if there are any you can do without.

The extreme solution would be to use option --no-follow-parent. This will prevent git svn from re-reading a branch from the beginning. The branches will still be read, however, they will not be connected to the rest of the history. That still shows you what was done there, but makes them very difficult to merge back.


Finally, note that you can interrupt and resume the clone process. To resume, run git svn fetch. You might need several restarts, but with a bit of patience the clone should go through.

sleske
  • 81,358
  • 34
  • 189
  • 227
  • BTW: The multiple fetching of revisions is also discussed here: http://stackoverflow.com/questions/1140428/git-svn-fetch-retrieves-the-same-subversion-revision-multiple-times-for-branches – sleske Jan 11 '14 at 01:18
  • thanks a lot. I am using a local copy of the repo. Quickly realized that it would be silly to convert directly off the network. That also explains why I would see the rev number jump around during conversion. Maybe there are a few sub-directory branches. About 30 devs have worked on this project over the years, so its hard to tell if the tags are simply tags, or if the branches are only from the trunk, but I will keep that in mind, as well as the --no-follow-parent option come the need for a last resort. – Opender Singh Jan 12 '14 at 19:07
  • Basically fixed speed using a local copy of the svn repo using svnadmin dump/load. – Opender Singh Jan 27 '14 at 21:33
  • About the suggestion to clone from a mirror of the SVN repo - I don't think it would work, because `git svn fetch` saves the repository URL as part of the commits hash, and it would require `git filter-branch` to go over all the commits in order to switch to working with the original after importing from the mirror. – Daniel Hershcovich May 13 '15 at 10:51
  • @DanielHershcovich: The question asks about a "conversion",so I assumed the SVN repository would be retired. If you want to keep using SVN and Git in parallel, then yes, you need to switch the SVN URL to the real one after cloning. There are ways to do this without rewriting all commits... but that is a separate question :-). – sleske May 13 '15 at 11:17
3

Resurrecting a very old question, but I thought the answer might be useful to someone.

You might want to try svn-all-fast-export / svn2git. A few years ago I was converting an old SVN repo with ~35k commits to Git while also splitting it into several separate Git repositories. I had a local copy of the SVN repo on my laptop and it only took around 15 minutes (which was great as I had to run the conversion many times before I was reasonably happy with the result ;). I also used BFG Repo-Cleaner to post-process the converted Git repositories.

svn-all-fast-export / svn2git is not the most straightforwardest of software, I had to resort to reading the source code a few times to really understand what's going on. You might want to check out my other answer on this topic for some tips: svn-all-fast-export: Match file names

piit79
  • 1,256
  • 1
  • 8
  • 13