11

My situation is that I have two Git repositories that I need to merge into a single repository (there are actually more repos, but I can start with two).

The two repositories are:

  • The main repository, A.
  • The second repository, B.

The code in repository B has dependencies on the code in repository A (but not vice versa), and the histories of both repositories follow each other in a chronological fashion - roughly (i.e. a specific commit in repo B will typically require a commit from repo A with a very similar commit time).

There are conflicting branch and tag names in both repositories (there are no guarantees that they belong together), but only the refs from A need to be preserved.

The requirements for the new repository, C, are:

  1. All refs (branches and tags) from A need to be preserved.
  2. Only the master branch commits from B need to be preserved (i.e. the commits that are reported by git log --first-parent master).
  3. The files from each source repository should be put into subfolders of the new repository (i.e. the files from A shall go into A/, and the files form B shall go into B/).
  4. When checking out a specific commit (including commits done before the merge) in repository C (e.g. a release tag) compatible files form both source repositories should be found in the directories A/ and B/ (at least within a commit or two).

So far I have tried several approaches, including this and git-stitch-repo, without success (they did not fulfill the above requirements).

At this point, I have managed to:

  • Move all files in each repo to a subdirectory using git filter-branch. E.g. for repo A:
mkdir A
mv * .gitignore A/ 2> /dev/null
git commit -a -m 'DROPME' > /dev/null
git filter-branch --tag-name-filter cat --index-filter 'git ls-files -s | sed "s-\t\"*-&A/-" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" ||:' -- --all
git reset --hard origin/master
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
  • Import repo B into A using git fast-export/fast-import.
  • Device a method for generating a mapping such that for a given SHA in A, there is a list of zero, one or more SHA:s that should be inserted from B.

What I would expect now, is that some clever usage of git filter-branch should enable me to insert the selected commits from B into the master branch of A. But how?

Community
  • 1
  • 1
m-bitsnbites
  • 994
  • 7
  • 19

2 Answers2

8

The solution turned out to be much more involved than I had hoped for. It involves manipulating and combining the output of two (or more) git fast-export streams, and importing them into a new repository using git fast-import.

In short, a new fast-import stream is generated by traversing two input streams, and switching back-and-forth between them based on a date-sorted log from the main branches.

I have implemented the solution in a Python script called join-git-repos.py, that I put in a GitHub repository here.

m-bitsnbites
  • 994
  • 7
  • 19
0

First, move everything in repo A to the subdirectory A/. Nothing fancy, just git mv. This preserves all branches and tags and commit IDs in A.

Then use git subtree to make the master branch of B a subtree of A in directory B/.

git subtree add -P B/ <remote for B> master

And you're done.


If you want old release tags on A to also reflect what would have been in B at that time... oy. You can do this without messing up your history too badly by merging B into A just before each release tag.

You have this.

          * - * - *           * - * - * branch
    v1   /         \    v2   /
* - * - * - * - * - * - * - * - * - * master
                                   /
  * -- * ---- * - * - * --------- *

The bottom line of commits is B. B's commits are laid out so they line up in time with A's.

And you want something like this.

          * - * - *           * - * - * branch
    v1   /         \    v2   /
* * * - * - * - * - * * * - * - * - * master
  |                   |            /
  * -- * ---- * - * - * --------- *

This has merged B into A just before each release tag. This avoids making up an artificial history of A and B being developed together.

I don't know how to do that in an automated fashion. The problem is rebase does not preserve tags, only merges. So adding the merge commit to v1 will lose the v2 tag and I'm not sure how to identify what the original commit of a rebased commit is.

Good luck.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    I will try it, but I don't see how this solution could satisfy requirement 4 in the question. Or am I missing something? – m-bitsnbites Dec 06 '16 at 07:31
  • @m-bitsnbites You want old release tags from repo A to take a guess at what was in repo B at the time? That's going to mess up your history real bad. I don't know if it's even logically possible while also preserving the branch history of A. – Schwern Dec 06 '16 at 19:44
  • @m-bitsnbites I put together an illustration of the resulting repository structure that's IMO the best compromise, but I don't know how to do the transform in an automated fashion. – Schwern Dec 06 '16 at 19:57
  • Does this chronologically interleave histories? When I tried using `subtree` and checked the history with `git log`, the commits were out of chronological order and B's commits sat on top of later commits in A. – Alexander Terp Feb 24 '23 at 19:49