15

I have two git repositories R1 and R2, which contain commits from two periods of a product's development: 1995-1997 and 1999-2013. (I created them by converting existing RCS and CVS repositories into Git.)

R1:
A---B---C---D

R2:
K---L---M---N

How can I combine the two repositories into a single one that contains an accurate view of the project's linear history?

A---B---C---D---K---L---M---N

Note that between R1 and R2 files have been added, deleted, and renamed.

I tried creating an empty repository and then merging their contents onto it.

git remote add R1 /vol/R1.git
git fetch R1

git remote add R2 /vol/R2.git
git fetch R2

git merge --strategy=recursive --strategy-option=theirs R1
git merge --strategy=recursive --strategy-option=theirs R2

However, this leaves in the end files that were in revision D, but not in revision K. I could craft a synthetic commit to remove the extra files between the merges, but this seems inelegant to me. Furthermore, through this approach the end-result contains merges that didn't actually occur.

Diomidis Spinellis
  • 18,734
  • 5
  • 61
  • 83
  • This sounds like a one-time problem, isn't it? If so, I think you could just go with synthesizing the commits and forget about how inelegant that feels. (in a sense, all the imported commits are already synthetic, so I don't think it's that bad). – Tamás Szelei Apr 03 '13 at 09:12
  • There are actually more than two repos in the case I described, and I can see this problem occurring again in the future (I'm trying to re-create the history of diverse projects as a git repo). – Diomidis Spinellis Apr 03 '13 at 09:51
  • Here is the generated repository: https://github.com/dspinellis/unix-history-repo – Diomidis Spinellis Mar 15 '16 at 19:26

4 Answers4

15

Using git filter-branch

Using the trick straight from the git-filter-branch man page:

First, create a new repository with the two original ones as remotes, just as you did before. I am assuming that both use the branch name "master".

git init repo
cd repo
git remote add R1 /vol/R1.git
git fetch R1
git remote add R2 /vol/R2.git
git fetch R2

Next, point "master" (the current branch) to the tip of R2's "master".

git reset --hard R2/master

Now we can graft the history of R1's "master" to the beginning.

git filter-branch --parent-filter 'sed "s_^\$_-p R1/master_"' HEAD

In other words, we are inserting a fake parent commit between D and K so the new history looks like:

A---B---C---D---K---L---M---N

The only change to K through N is that K's parent pointer changes, and thus all of the SHA-1 identifiers change. The commit message, author, timestamp, etc., stay the same.

Merging more than two repositories together with filter-branch

If you have more than two repositories to do, say R1 (oldest) through R5 (newest), just repeat the git reset and git filter-branch commands in chronological order.

PARENT_REPO=R1
for CHILD_REPO in R2 R3 R4 R5; do
    git reset --hard $CHILD_REPO/master
    git filter-branch --parent-filter 'sed "s_^\$_-p '$PARENT_REPO/master'"' HEAD
    PARENT_REPO=$CHILD_REPO
done

Using grafts

As an alternative to using the --parent-filter option to filter-branch, you may instead use the grafts mechanism.

Consider the original situation of appending R2/master as a child of (that is, newer than) R1/master. As before, start by pointing the current branch (master) to the tip of R2/master.

git reset --hard R2/master

Now, instead of running the filter-branch command, create a "graft" (fake parent) in .git/info/grafts that links the "root" (oldest) commit of R2/master (K) to the tip (newest) commit in R1/master (D). (If there are multiple roots of R2/master, the following will only link one of them.)

ROOT_OF_R2=$(git rev-list R2/master | tail -n 1)
TIP_OF_R1=$(git rev-parse R1/master)
echo $ROOT_OF_R2 $TIP_OF_R1 >> .git/info/grafts

At this point, you can look at your history (say, through gitk) to see if it looks right. If so, you can make the changes permanent via:

git filter-branch

Finally, you can clean everything up by removing the graft file.

rm .git/info/grafts

Using grafts is likely more work than using --parent-filter, but it does have the advantage of being able to graft together more than two histories with a single filter-branch. (You could do the same with --parent-filter, but the script would become very ugly very fast.) It also has the advantage of allowing you to see your changes before they become permanent; if it looks bad, just delete the graft file to abort.

Merging more than two repositories together with grafts

To use the graft method with R1 (oldest) through R5 (newest), just add multiple lines to the graft file. (The order in which you run the echo commands does not matter.)

git reset --hard R5/master

PARENT_REPO=R1
for CHILD_REPO in R2 R3 R4 R5; do
    ROOT_OF_CHILD=$(git rev-list $CHILD_REPO/master | tail -n 1)
    TIP_OF_PARENT=$(git rev-parse $PARENT_REPO/master)
    echo "$ROOT_OF_CHILD" "$TIP_OF_PARENT" >> .git/info/grafts
    PARENT_REPO=$CHILD_REPO
done

What about git rebase?

Several others have suggested using git rebase R1/master instead of the git filter-branch command above. This will take the diff between the empty commit and K and then try to apply it to D, resulting in:

A---B---C---D---K'---L'---M'---N'

This will most likely cause a merge conflict, and may even result in spurious files being created in K' if a file was deleted between D and K. The only case in which this will work is if the trees of D and K are identical.

(Another slight difference is that git rebase alters the committer information for K' through N', whereas git filter-branch does not.)

Community
  • 1
  • 1
Mark Lodato
  • 50,015
  • 5
  • 41
  • 32
  • The last step could just be `git rebase R1/master`. – vonbrand Apr 04 '13 at 17:18
  • @vonbrand, I updated my answer to explain why that will not work. – Mark Lodato Apr 05 '13 at 02:36
  • It worked fine, thank you! I had to integrate multiple repos (14000 commits), so I went with the grafts option you mentioned. I am editing your entry, because the proposal assumed I was merging branches rather than repos. – Diomidis Spinellis Apr 05 '13 at 07:30
  • Great - glad it worked! I reworded the answer to phrase it as "repos" and not "branches," moved the "multiple repos with grafts" to its own section, and clarified the "two repos with grafts" section a bit. (Before, my commands made a graft using all root commits; the new commands only graft one of the root commits.) – Mark Lodato Apr 08 '13 at 02:09
  • I'm skeptical of your claim about the rebase solution that "*This will most likely cause a merge conflict, and may even result in spurious files being created in `K`' if a file was deleted between `D` and `K`. The only case in which this will work is if the trees of `D` and `K` are identical.*" There must be a way to prefer or completely replace the working directory tree of `D` with `K` so that `K'` matches `K`...I'll need to go look it up later. If it's not an automatic merge strategy, then low-level plumbing for replacing trees might work. –  Jul 22 '14 at 00:21
  • Okay, I just tested this out, if development really did stop on the older repo, then even if the newer repo has deleted files, the default 3-way recursive merge algorithm that Git uses should be able to cleanly apply patches during the rebase, without having to resort to manual conflict resolution. **There will be no "spurious" files**. So I don't see the rebase solution as a problem. If, for some reason, conflicts do occur, you could even just replace the working directory tree of the older history with a `git rm .` followed by checking out the directory of the first commit of the new repo. –  Jul 22 '14 at 00:47
  • @DiomidisSpinellis Question about your 14000 commits merge: if repo1 has file1 and repo2 has file2, in the final merged repository, did you had both file1 and file2 in the same time? I am asking because filter-branch doesn't rewrite history like rebase does, so would you have a real combined history? – Flavius Feb 12 '18 at 03:30
  • In the end, I created by hand a git-fast-import stream that had files co-existing in a hidden directory while a next version was being added. See https://www2.dmst.aueb.gr/dds/pubs/jrnl/2016-EMPSE-unix-history/html/unix-history.html – Diomidis Spinellis Feb 13 '18 at 07:42
2

The original poster states:

R1:
A---B---C---D

R2:
K---L---M---N

How can I combine the two repositories into a single one that contains an accurate view of the project's linear history?

How can I combine the two repositories into a single one that contains an accurate view of the project's linear history?

A---B---C---D---K---L---M---N

Note that between R1 and R2 files have been added, deleted, and renamed.

So I know for certain that if the first commit of the newer repo, K, were identical or slightly modified from the last commit of the older repo, D, then you could simply fetch R1's history into R2, then rebase the commit graph of R2 onto the graph from R1:

# From R2
git fetch R1
git checkout master
git rebase --onto R1/master --root

Non-linear histories (when you have merge commits)

That's assuming that R2's graph is linear. If it has merge commits, you could attempt to do the same thing by specifying that you want to preserve merge commits,

git rebase --preserve-merges --onto R1/master --root

However, if you ever had to resolve conflicts in any of those merges that you're rebasing, you'll probably need to re-resolve them again, which is probably going to be a hassle.

Combining two radically different histories?

The original poster said:

Note that between R1 and R2 files have been added, deleted, and renamed.

As I pointed out above, a simple rebase should work if the first commit of the newer repo, K, is the same or only slightly different from the last commit of the older repo, D. I'm not sure if the same rebase will work cleanly if K is in fact significantly different from D. I suppose that in the worst case, you might have to resolve a lot of conflicts during the very first application of K during the rebase.

Documentation

  • Note to self, add how to prefer conflict resolutions from the newer repo in the case of painful conflicts in the first patch of the rebase. –  Jul 22 '14 at 00:17
1

This is what I did that worked:

git init
git remote add R1 /vol/R1.git
git fetch R1
git remote add R2 /vol/R2.git
git fetch R2
git co -B master R2/master
git rebase R1/master
git push -f
cforbish
  • 8,567
  • 3
  • 28
  • 32
0

All you should need is: git rebase followed with what branch you're rebasing.

In a nutshell, rebase rewinds all of the commits of the branch, and merges them with the commits of the branch you are rebasing.

Depending on how much differentiation there is between the two branches, you may run into conflicts. But there is no avoiding the same conflicts by using any other method.

Good luck!

jakenberg
  • 2,125
  • 20
  • 38