Edit: for a date-based approach that makes this pretty easy but assumes one of the two repositories is going to be "in control" of which commits come from the other repository, see jthill's answer. You end up with a commit history that exactly matches the "project" history, possibly squashing some of the "tests" history. The answer below is more appropriate if you need to add a prefix to both sets of histories, or want to interleave them (e.g., need two different "tests" updates for the same "project" commit).
phd's answer is fine, but if I were doing this myself and wanted to make it really neat and clean, I would use a different approach.
If the trees for the two repositories don't overlap, it's certainly possible to do this—and by bypassing the usual Git mechanisms, going straight to underlying git read-tree
commands, you can automate it. (This is where VonC's recent comment rejecting my claim that Git and Mercurial are very much alike is true: if you bypass the top level Git commands, you get something you cannot get nearly as easily in Mercurial.)
Just as in phd's answer, you would start this process by combining the two repository commit databases via git fetch
. (You can do this in a third repo, which I'd recommend since it makes it easier to restart the process from scratch if you decide you want to tweak some parameters, or by adding either repo A to repo B, or repo B to repo A.) But after that, everything diverges.
You now have two disjoint commit DAGs:
D--...--K
/ \
A--B--C M--N <-- repoA/master
\ /
E--...--L
O--P--Q--...--Z <-- repoB/master
(If repoA and repoB both have more than one branch tip, draw whatever simplified diagram of their commits is more appropriate.)
Your next step is to enumerate all the commits in each of the two disjoint DAGs, using git rev-list --topo-order --reverse
and whatever other sorting options you like. When and whether --topo-order
is required depends on the topology and other sorting information, but in general you will want a parent commit listed before any of its children.
Given these two linearized lists of commit hash IDs, you now have the hard part: constructing the graph of new, combined trees you wish to commit. Every new commit will be made by combining one commit from each of the two old graphs. If one of the graphs is complex (as for repoA above) with branches and merges, and one isn't (as for repoB above), this can be particularly tricky.
I've made my own setup for this, where I have a very simple graph:
A--B <-- A/master
O--P <-- B/master
In my simplified setup, I'd like to make my first commit on my new master be commit C
that combines the trees of A
and O
:
C <-- master
Then I'd like to make, as my second commit on master
, the combination of A
and P
(not A
and O
and not B
and O
either), and as my last commit, the combination of B
and P
, so that I end up with:
C--D--E <-- master
with:
C = A+O
D = A+P
E = B+P
So, here we are in a new empty repository, except that we've read in projects A and B:
$ git log --all --graph --decorate --format='%h%d %s' --name-status | sed '/^[| ] $/d'
* 7b9921a (B/master) commit-P
| A B/another
* 51955b1 commit O
A B/start
* 69597d3 (A/master) commit-B
| A A/new
* ff40069 commit-A
A A/file
(I accidentally didn't hyphenate commit O, but did hyphenate all the others. The sed
is to remove some blank lines that don't really help reading, in this case.)
$ git status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
Now we build the new commits, one at a time, using git read-tree
to populate the index to make the commits. We start with an empty index (which we have right now):
$ git status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
We want our first commit to combine A
and O
, so let's read those two commits into the index now. If we had to add a prefix to the tree in A
we could do that here:
$ git read-tree --prefix= ff40069
$ git ls-files --stage
100644 7a1c6130c652b6ea92f4d19183693727e32c9ac4 0 A/file
$ git read-tree --prefix= 51955b1
$ git ls-files --stage
100644 7a1c6130c652b6ea92f4d19183693727e32c9ac4 0 A/file
100644 f6284744575ecfc520293b33122d4a99548045e4 0 B/start
We can make the commit we need now:
$ git commit -m combine-A-and-O
[master (root-commit) 7c629d8] combine-A-and-O
2 files changed, 2 insertions(+)
create mode 100644 A/file
create mode 100644 B/start
Now we need to make the next commit, which means we need to build up the correct tree in the index. To do that we first have to clean it out; otherwise the next git read-tree --prefix
will fail with a complaint about overlapping files and Cannot bind.
So now we empty the index, then read commits A and P:
$ git read-tree --empty
$ git read-tree --prefix= ff40069
$ git read-tree --prefix= 7b9921a
If you like, you can examine the result using git ls-file --stage
again:
$ git ls-files --stage
100644 7a1c6130c652b6ea92f4d19183693727e32c9ac4 0 A/file
100644 d7941926464291df213061d48784da98f8602d6c 0 B/another
100644 f6284744575ecfc520293b33122d4a99548045e4 0 B/start
In any case they can now be committed as the new commit:
$ git commit -m 'combine A and P'
[master eb8fa3c] combine A and P
1 file changed, 1 insertion(+)
create mode 100644 B/another
(you can see now how I end up with inconsistent hyphenation :-) ). Last, we repeat the process by emptying the index, reading in the two desired commits (B+P), and committing the result:
$ git read-tree --empty
$ git read-tree --prefix= A/master
$ git read-tree --prefix= B/master
$ git ls-files --stage
100644 7a1c6130c652b6ea92f4d19183693727e32c9ac4 0 A/file
100644 8e0c97794a6e80c2d371f9bd37174b836351f6b4 0 A/new
100644 d7941926464291df213061d48784da98f8602d6c 0 B/another
100644 f6284744575ecfc520293b33122d4a99548045e4 0 B/start
$ git commit -m 'combine B and P'
[master fad84f8] combine B and P
1 file changed, 1 insertion(+)
create mode 100644 A/new
(I used symbolic names here to get the last two commits, but hash IDs from git rev-list
would of course work well.) We can now see the three commits, all on master
:
$ git log --decorate --oneline --graph
* fad84f8 (HEAD -> master) combine B and P
* eb8fa3c combine A and P
* 7c629d8 combine-A-and-O
and it's now safe to delete the A/master
and B/master
references (and the two remotes). There's one peculiarity: since we did all the work directly in the index, without bothering with a work-tree, the work-tree is still completely empty:
$ ls
$ git status -s
D A/file
D A/new
D B/another
D B/start
To fix that at the end, we should just run git checkout HEAD -- .
:
$ git checkout HEAD -- .
$ git status -s
$ git status
On branch master
nothing to commit, working tree clean
How to write your own automation script
In practice, you will probably want to use git write-tree
and git commit-tree
, rather than git commit
, to make the new commits. You would write a little script (in whatever language you like) to run git rev-list
to collect the hashs IDs of commits to combine. The script must inspect those commits—e.g., by looking at authorship and dates, or file contents, or whatever—to decide how to interweave the commits. Then, having made the decisions about interweaving and what branch-and-merge structures to provide, the script can begin the process of repeatedly doing these steps:
- Empty the index.
- Yank in a tree from a commit in the sub-graph from repo-A, with whatever
--prefix
option is appropriate—in your case this is the --prefix=
, i.e., the empty string, but in other cases it would be a directory name with a trailing slash).
- Yank in a tree from a commit in the sub-graph from repo-B, with another appropriate
--prefix
, so that there are no collisions between entries from A
and B
.
- Use
git write-tree
to write the tree. Its output is the tree hash ID for the next step.
- Use
git commit-tree
with appropriate -p
argument(s) to set the parent(s) of the new commit. Feed it the appropriate (combined or whatever) commit message text. Use the environment variables GIT_AUTHOR_NAME
, GIT_AUTHOR_EMAIL
, GIT_AUTHOR_DATE
, GIT_COMMITTER_NAME
, GIT_COMMITTER_EMAIL
, and GIT_COMMITTER_DATE
to control the author and committer names and dates. The output from git commit-tree
is the hash ID, which is the parent of some subsequent commit.
When the whole thing finishes, the last commits made for any particular branch or set of branches are the hash IDs that go into those branches, so you can now run:
git branch <name> <hash>
for each such hash ID.