16

We have two Subversion repositories, each with a single project. So:

svn://server/svn/project_a
svn://server/svn/project_b

They are separate projects, and are in separate repositories with completely separate commit histories. Project A has r1, r2, ... r100 and Project B has r1, r2, ... r400

We would ultimately like to merge these two SVN repositories into a single Git repository. Whether the merge can take place in Git, or should take place in a third temporary SVN repository first, we ultimately want to see:

git://server/svn/projects/

Which is a repository with both Project A and Project B. They will be stored in separate folders, like:

git://server/svn/projects/project_a
git://server/svn/projects/project_b

So there won't be any conflicts "merging" the two. We were able to use this answer flawlessly to transfer a single SVN project into a single Git project, with commit history included.

We would like to merge our two SVN Projects A and B into a single Git repository, but we want the commits to be merged by date. ie:

8b8dad: Project A, r1 (first commit in Git)
dbdffe: Project B, r1 (child of previous)
0ae7f7: Project B, r2 ...
615b51: Project A, r2 ...
916e59: Project A, r3 ...
85f241: Project B, r3 ...

Is this possible? Should we merge the two SVN repositories into one, then import into Git? Or is it easier to leave them separate, and perform the merge during the Git import?

Community
  • 1
  • 1
Craig Otis
  • 31,257
  • 32
  • 136
  • 234

3 Answers3

8

So I tried Craig's method, but this left me with a somewhat unsatisfactory history on the combined repository in the end. I found that checkout out all the svn repos into separate git ones and then branching them together made a nice history where three branches meet.

So first you do the "authors" step to create authors.txt:

someguy = Some Guy <someguy@yourcompany.com>
...
(no author) = no_author <no_author@no_author>

Now you have to check out all svn repos using git:

mkdir proja projb projc ...

Now you have to repeat the following for every project, and since your repos are probably not one single folder do an additional commit:

cd proja
git svn init https://svn.mycompany.com/svn/proja --no-metadata
git config svn.authorsfile ../authors.txt
git svn fetch

#here comes the additional part:
mkdir -p proja                  #proja/proja
git mv -k * proja               #move everything in there
git commit -m "subtree proja"

Then I went and made my new combined repo in which i used a different branch for each subproject:

mkdir ../superproj
cd ../supeproj
git init
git commit --allow-empty        #so that we have a master branch
git branch proja projb projc...

The following needs to be repeated for every sub-project:

git checkout proja
git remote add proja_rm ../proja
git pull proja_rm              #probably add a branch (e.g. master)
git remote rm proja_rm         #cleanup

Finally you can combine the whole thing into your master

git checkout master
git merge proja projb projc...  #it all comes together
git push whereeveryouwant
Underdetermined
  • 419
  • 4
  • 9
4

Here's what we ended up doing:

Step 1: Merge the SVN Repositories into a Temporary SVN Repository

This requires access to the SVN repository (NOT the working copies):

First, create dump files of each repository you want to merge:

svnadmin dump project_a > dumps/a.dmp
svnadmin dump project_b > dumps/b.dmp
svnadmin dump project_c > dumps/c.dmp

Then, create a new repository that will house the merged repositories:

svnadmin create svn-temp-project

Note that you MUST checkout this repository into a working copy, and create the project subdirectories, or the load of your dumps will not work:

svn co file:///var/svn/svn-temp-project svn-temp-project-wc
cd svn-temp-project-wc
mkdir project_a
mkdir project_b
mkdir project_c
svn add . --force
svn ci -m "Added initial project directories."

Then, you can load each individual dump file into its own specific (!!) project directory:

svnadmin load svn-temp-project --parent-dir project_a < dumps/a.dmp
svnadmin load svn-temp-project --parent-dir project_b < dumps/b.dmp
svnadmin load svn-temp-project --parent-dir project_c < dumps/c.dmp

You now have a 3-merged SVN repository.

Step 2: Migrate the 3-merged SVN repository into a Git repository

The following steps can be performed on a local machine - does not need to take place on your server.

First, create an authors.txt file that git-svn can use to determine the author of each commit. I used:

someguy = Some Guy <someguy@yourcompany.com>
...
(no author) = no_author <no_author@no_author>

With this authors file in place, you can then:

cd projects/
mkdir my-git-repository
cd my-git-repository
git svn init https://svn.mycompany.com/svn/svn-temp-project --no-metadata
git config svn.authorsfile ../authors.txt
git svn fetch

Step 3: Cleanup

This method works well for merging commit history, but you end up with SVN-like directories:

repo/project_a/trunk
repo/project_a/branches
repo/project_a/tags
repo/project_b/trunk
repo/project_b/branches
repo/project_b/tags
...

Thus, before pushing, you should migrate any tags/branches to Git. We did not do this. Our tags were unnecessary to keep around, as we had other sources to retrieve them, and we did not have any branches for these projects.

After removing the branches and tags directories, we then dropped the contents of trunk/ down one level, so everything was at the project-specific "root" level.

Craig Otis
  • 31,257
  • 32
  • 136
  • 234
3

Here's what I'd do in a Linux shell (untested):

  1. convert each to its own git repo
  2. make a third git repo with an empty first commit

    git ci --allow-empty -m'Add empty, initial commit'

  3. in the empty repo, add each repo as a remote

    git remote add repoA 'path/to/git/repoA'
    git remote add repoB 'path/to/git/repoB'

  4. fetch the repos into the empty one (this gets all the objects into one repo)

    git fetch repoA
    git fetch repoB

  5. get a list of commits in each repo prefixed with Unix timestamps (seconds since 1/1/1970)

    git --no-pager log --format='%at %H' master >repoACommits
    git --no-pager log --format='%at %H' master >repoBCommits

  6. cat both of them into one, sorted (by timestamp) list, culling the timestamps:

    cat repoACommits repoBCommits | sort | cut -d' ' -f2 >orderedCommits

  7. in your new repo, run through the list, cherry-picking each (presumably to master)

    git co master
    cat orderedCommits | while read commit; do git cherry-pick $commit; done

This is all theoretical, but I think it'll work. I don't know what happens if you have a merge conflict between the two. I'm not sure if the while will stop, or keep trying and failing to continue.

I just noticed you mentioned wanting to keep each in repo's work in separate folders in the final folder. You'll need the mysterious and powerful git filter-branch to first run through each repo separately, doing the work of moving added things into a folder, per-commit. That's probably worth a new question, if it's not already answered on SO.

Gary Fixler
  • 5,632
  • 2
  • 23
  • 39
  • Gary - thank you for your answer, it looks just about perfect, and we'll try it soon. Per your mention about keeping each repo's work in separate folders in the final folder, ie. `projects/projectA` and `projects/projectB`, would it be possible to set up the Git remote to point (as destination) to a specific subfolder? So that the `fetch` doesn't just dump both repositories into the root? – Craig Otis May 05 '13 at 13:40
  • No, that wouldn't work. Git stores 'trees', which are recursive directory listings (1 text-file per directory). There isn't an easy way to point those somewhere else during this procedure. You need to filter-branch each separate repo to create a folder and move everything into it before you can move on. I just tested this locally, and it worked: `git filter-branch --tree-filter 'mkdir -p newfolder; find -mindepth 1 -maxdepth 1 -not -name newfolder -exec mv {} $fname newfolder \;' master` - change the 3 instances of "newfolder" to whatever name you want for that particular repo's subfolder. – Gary Fixler May 06 '13 at 02:02
  • Once you've done that, you can verify by doing `git whatchanged --oneline` - all of the files listed per commit should have the foldername preceding them. *Then* you can get the logs, cat/sort them, and use them to cherry-pick. A note on cherry-picking - it's possible you'll have empty commits somewhere, which will crash the cherry-pick command I originally mentioned. Add `--allow-empty` to get past this after `cherry-pick`. – Gary Fixler May 06 '13 at 02:04
  • Oh, and if you're worried about messing up your new svn->git repos, just go outside them and `git clone reponame testreponame` and use that instead. Git really shines for allowing such rapid cloning and experimentation like this. – Gary Fixler May 06 '13 at 02:05
  • I should add that you can also filter-branch a branch (obviously), so you could do that step twice on the same repo once you've fetched them both in, but you'd have to set up local branches for each, and I think it would be safer to clone each new git repo, hack on that until you prove they're foldered up across time how you want, then drag them into one repo and interleave them as mentioned. The repos will appear in branches labelled "repo1/master" and "repo2/master", so they won't be merged together. The cherry-picks is what will interleave them on your master branch. – Gary Fixler May 06 '13 at 08:47
  • Gary - we ended up going a different route, but I'll be tossing the bounty your way for all your hard work. Thanks for the advice, you led me in the right direction. – Craig Otis May 06 '13 at 21:44
  • Much appreciated, Craig. I'm glad you found a solution. – Gary Fixler May 07 '13 at 08:31