4

I had the following git repositories

  • repoA
  • repoB
  • repoC

which I combined into

  • repoAll where each repo was moved into a subdir

so this looks like

  • repoAll
    • dirA
    • dirB
    • dirC

I have followed the instructions on http://jasonkarns.com/blog/merge-two-git-repositories-into-one/ to make this happen. Which essentially means

git remote add -f repoA /path/to/repoA
git merge -s ours --no-commit repoA/master
git read-tree --prefix=dirA/ -u repoA/master
git ci -m "merging repoA into dirA"
...

So now however the history for the files is no longer connected since

git log --follow dirA/pom.xml

shows nothing.

However,

git log --follow pom.xml

does show the correct (old) history for that file. This is not really good enough since no tool such as eclipse or other git clients will be able to show the full history.

To make matters worse, there have been already new commits on the combined repo so doing the merge again is not really an option (I now know that I should have moved repoA/* into repoA/dirA before doing the merge).

I have thought about inserting a commit that would do the move before the initial merge on repoAll but that would require me to rebase all the changes (which are now a 100+) and resolve the changes.

The question/solution Git log shows very little after doing a read-tree merge and How can I rewrite history so that all files, except the ones I already moved, are in a subdirectory? seem to only work for the whole repository, not for a specific subdir (or at least not if you have already new commits on repoAll).

I think that there should be some way to rewrite the history of a specific subdir (such as dirA) but I cannot seem to figure out how.

Community
  • 1
  • 1
RaB
  • 1,545
  • 13
  • 16

2 Answers2

2

I ended up fixing the problem with a slightly more laborious solution however it might be simpler

  1. I recorded the SHA1 of the first commit on repoAll that was made by the developers (so the first real commit after the joining of the repositories). Ideally you create a branch to be able to find it again (git branch changes_start_here <SHA1>)
  2. I started out again with an empty repository and cloned the individual repositories (repoA, ...) anew
  3. I went to repoA and added a commit where I moved all contents of repoA into dirA (still on repoA)

    cd repoA
    mkdir dirA
    git mv src pom.xml other* dirA  (i.e. all contents except for dirA will be moved to dirA)
    git commit -m "moved repoA to dirA"
    

    repeat that for each repo

  4. on the new (empty combined repository) repoAllNew I now added all the local repository copies as remotes

    cd repoAllNew
    git remote add -f origin-repoA ../repoA
    git pull origin-repoA master
    

    repeat for each repo

  5. make sure that the history is ok by doing something like

    git blame dirA/src/main/java/HelloWorld.java
    

    (obviously this has to be an existing file which has some longer history). Check that the blame contains meaningful messages for each source line.

  6. re-import all changes that were made by developers after the repos were merged. This can be done by adding the old repoA as a remote:

    git remote add -f origin-repoAllOld ../repoAll
    

    Now we need to merge all new changes that were made after the joining of the repos into the cleaned up repository.

    git branch start <SHA1 of origin-repoAllOld/changes_start_here>
    git branch end <SHA1 of origin-repoAllOld/master>
    git rebase --onto master start end
    
  7. now you should have the same state as you had on repoA but with correct history.

Management summary

we had to insert a change that moved the contents of each repository into the corresponding subdirectory already before starting the repo migration. That way the history is still correct and things like blame etc work just fine. IMHO git read-tree --prefix ... should be avoided unless you want to start messing with git filter-branch (which in 99% of all cases you dont want to).

RaB
  • 1,545
  • 13
  • 16
2

Based on r3m0t's idea of rewriting history, following lines did the whole trick for me, to merge the other git repository as new branch into my existing one into a sub directory:

(working in git-sh I could omit the leading 'git' for commands)

co -b my-new-branch 
remote add -f origin-my-old-standalone-project ../my-old-standalone-project/
pull origin-my-old-standalone-project master
mkdir my-new-subdir
ci -am "merge 'old' standalone project as new branch 'my-new-branch'"
git filter-branch --index-filter \
        'git ls-files -s | sed "s%\t\"*%&my-new-subdir/%" |
                GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                        git update-index --index-info &&
         mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"' HEAD

After that I had both: A history for individual files in the new sub directory as if they have been there all the time, and the normal history in the main directory, as if the new files in the sub directory always had been there. (As you can see, no read-tree or any other not daily used commands are necessary, the 'filter-branch' does the whole trick.) IDE's are able (resp. should be; successfully tested PyCharm) to work normally with the result.

After that you should be able to merge your branches as normally, getting all projects into one.

tl;dr: --follow works as expected, normal history also, after executing above 6 commands to merge old git project into new branch and sub directory of other git project

Community
  • 1
  • 1
Jaleks
  • 561
  • 5
  • 19
  • Thanks for sharing your solution, it helped me a lot ! :) – Bruno Lavit Mar 08 '16 at 10:50
  • 1
    A couple of things which bit me: 1. the "sed" command assumes GNU sed syntax (I'm on OS X so had to replace it with "gsed"). 2. I do this process inside a "virgin" repository first ("mkdir dir; cd dir; git init"), then I can pull the files into the real destination without conflicts. – Capt. Crunch Nov 17 '16 at 03:10
  • @AmosShapira Thanks for the tip! Was wondering why it wasn't working for me. – Bri Bri May 18 '18 at 21:08
  • git pull --allow-unrelated-histories if there is an error of "refusing to merge unrelated histories". – Kevin Yang Jul 01 '19 at 13:00