2

I have a huge svn repository with about 2.5gb files. It has many branches and tags. It has 72000 revisions. I want to make a local git clone Doing this through regular git clone takes about 24 days.

Will this work:

  1. I will use multiple machines to clone part of the repository parallelly. The first machine will clone revisions 1 to 12000, the next will do it from 12000 to 24000 and so on...

  2. Then I have to merge all these local gits in to one.

How can I do this. Is there any other way to do this?

EDIT: My main requirement is to be able to query change history locally. So, I want to have a local copy of entire history. In fact, that is the main reason I want to move to git. Also, I dont have a admin access to the svn repo

anony
  • 314
  • 2
  • 14

5 Answers5

1

Doing cloning in parallell is not going to work as a revision in git needs to point to the SHA1 hash of its parent revision.

Given that only machine #1 will (sometime in the future) know the SHA1 hash of revision 12000 it's impossible for machine #2 to create the git revision for revision 12001 ahead of that time as the SHA1 hash isn't available to machine #2. And even if it would be possible to communicate the hashes between multiple machines during importing it would still be a serial process rather than parallell.

Arenhag
  • 76
  • 3
  • thanks for your answer. Now I'm sure that this approach doesnot work. But will this work if I want to do it parallelly on the same machine but using different consoles. Even doing that would be nice. – anony Dec 01 '13 at 11:39
  • Nope, the problem is still the same - console #1 will never be able to know anything about the SHA1 hash of revision 12000 before it has been imported and that hash is still a prerequisite for importing revision 12001 in console #2. – Arenhag Dec 01 '13 at 19:09
1

Are you trying to clone the entire repository or a specific branch? I suggest cloning only the trunk. Other than that, there's no silver bullet here. If it takes a long time, you'll just have to wait. You only need to do this once anyway (hopefully, well, see my last note below).

Btw, I wrote a blog post on working with large Subversion repos with Git, you might find other useful tips in there.

Personally, after a lot of struggle trying to do this, I went back to native Subversion. It was just too much overhead. Sometimes my local Git repo got corrupted beyond repair and I had to clone the large repo again... I had a lot of pains with this. You've been warned.

janos
  • 120,954
  • 29
  • 226
  • 236
  • right. but I dont need the local repo for work. I just want local history. I want to be able to fire history queries offline. That is the main reason I'm trying to git clone it. Is there any alternative? – anony Dec 01 '13 at 11:38
  • If you just want to query the history, that should work fine, once you have the initial clone. But to get the initial clone there's no other way than cloning with `git` or getting a fast-export dump. – janos Dec 01 '13 at 11:48
  • if all you want to do is get local history, stop the svn server; copy the remote repo; start it up on your local box. Then you can fire history queries against it using a local svn server (or just go with the file: protocol directly). you can use svnsync to update your local copy from then on with just delta changes in real time. – gbjbaanb Dec 05 '13 at 14:32
0

I'm not sure your approach is the best.

There is a svn-fe in contrib that's supposed to fast-import from a dump generated by svnadmin dump. Consider giving that a shot.

There's also a git-remote-testsvn in git that's not very polished (or guaranteed working). You can give that a shot too.

Also, can you clarify whether you want a one shot migration or to actually use the svn repository but with git on your end.

Noufal Ibrahim
  • 71,383
  • 13
  • 135
  • 169
  • Well then, you're going to have the trouble cloning it. Perhaps you should consider a [shallow clone](http://stackoverflow.com/questions/747075/how-to-git-svn-clone-the-last-n-revisions-from-a-subversion-repository) if you don't need the entire history. – Noufal Ibrahim Nov 29 '13 at 16:44
  • I want the entire history. That is my main requirement – anony Nov 29 '13 at 16:46
  • And all the branches/tags? And you want to continue working on it with regular `git svn rebase` operations? I'm not sure if you'll get very far with my suggestions though. Try though out nevertheless. – Noufal Ibrahim Nov 29 '13 at 17:39
0

Finally I found a way to do this: This works when there is only one trunk on svn and no branches. If svn has branches then we have to repeat this process for each branch

Let us say your svn has 6000 revisions. first 3000 are with git1, second 3000 are with git2.

  1. first add git2 as a remote to git1 git remote add git2 <path to git2>

  2. checkout git2 as a local branch which is tracked. git checkout -b git2 remotes/git2/master

Now you have two detached branches.

    ... o --- o --- ... o
                        git1
    o ... o
    r'    git2

We know git1 is at revision 3000 and git2 starts at 3001 and ends at 6000. 3. We will join these two branches like this git rebase -p --onto <git1SHA> --root <git2SHA>

There will be conflict but that will be only for the first commit.

the end result is as follows:

               git1
    o --- .... o
                \
                 \
                  o --- .... o
                  r          git2
  1. Resolve the conflicts. Which is easy. You just have to delete the conflict indicators.

  2. git checkout master

  3. git merge git2 This is a fast-forward.

Now see git log --oneline. You will see all 6000 commits.

anony
  • 314
  • 2
  • 14
0

if all you want to do is get local history, stop the svn server; copy the remote repo; start it up on your local box. Then you can fire history queries against it using a local svn server (or just go with the file: protocol directly). you can use svnsync to update your local copy from then on with just delta changes in real time.

Not having admin access to the svn repo is fine - you can use svnsync to export a backup of the repo without needing anything special, it will have to be a pull style svnsync rather than a push (from the repo to the backup).

Svnsync works by replaying commits, so you'll just be replaying every commit ever from the repo to your local 'backup'. It might take a little while to do it this way - you usually seed the backup with the original repo either by copying the file structure or dumping it.

gbjbaanb
  • 51,617
  • 12
  • 104
  • 148