How can I split my git repository into two repositories (recent and history) at a specific commit SHA while also preserving the branches in each, properly linked to their commits on master?
Problem description
While many SO questions ask and answer how to split off a subdirectory (e.g., The Easy Way), that is not what I need to do. Rather, I need to split the repository commits to all of those before a certain commit, and all of those that follow. While my repository is large with thousands of commits and hundreds of branches over a ten year history, the problem can be boiled down to a simple repository with 8 commits (1-8) and three branches (master, A, B):
1 - 2 - 3 - 4 - 5 - 6 - master
\ \
7 8
\ \
A B
After conversion, what I want is two repositories. The first (project-history) should contain historical commits 1, 2, 3, and 4 and the associated commit 7 on branch A. The second (project-recent) should contain commits 4, 5, 6 and associated commit 8 on branch B. These would look like:
project-history project-recent
1 - 2 - 3 - 4 -master 4 - 5 - 6 - master
\ \
7 8
\ \
A B
There is a similar problem described in Split a Git Repository into Two, but neither of the answers are accepted, and neither produce the results I need, which I describe below, along with a test script.
Possible Approach: Branch, then rebase using an orphaned commit
The Pro Git book Chapter 7.13 Git-Tools-Replace provides an approach that comes very close. In that approach, you first create the history, and then rebase the recent commits onto a new orphan commit.
Create the history
- find the SHA of the commit on which the repository is to be split
- create a
history
branch at that point - push the history branch and its attached branches into a new project-history repo
This all works great.
Rebase the recent commits
But this next part doesn't work fully:
- Create an orphan commit, which produces commit
aaf5c36
git commit-tree 8e3dbc5^{tree}
- Rebase the master onto
aaf5c36
starting at the parent of the split commitgit rebase --preserve-merges --onto aaf5c36 8e3dbc5
- Push this new master and branch B into a new project-recent repo
The problem: Branch B is disconnected from master in the new project-recent repository. The resulting repositories look like:
project-history project-recent
1 - 2 - 3 - 4 -master 4 - 5 - 6 - master
\
7 1 - 2 - 3 - 4 - 5 - 8- B
\
A
Script to illustrate the issue
The repo-split-example.sh script creates an example repository (repo-split-example
), then splits it using this technique into repo-split-history
and repo-split-recent
repositories, but the branch B is unattached in the latter. In addition, by pushing the branch B into the recent repository, the historical commits are also pushed into the repository (commits 1,2,3), and there are duplicates of commits 4 and 5 (the originals, plus the rewritten ones from the rebase). Here's the final state of the project-recent repo:
$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history
* 7a98d11 (B) branchB
* 1f620ac fifth
* 1853778 fourth
* 14ab901 third
* 8dd0189 second
* bb1fc8d first
Whereas what I want is:
$ git log --graph --all --oneline --decorate
* c29649c (HEAD -> master) sixth
| * 7a98d11 (B) branchB
|/
* e8545fd fifth
* 8e3dbc5 fourth
* aaf5c36 Get history from historical repository at file:///Users/jones/development/git-svn-migrate/repo-split-history
The repo-split-example.sh script is an easy way to reproduce the problem. How can I get the project-recent repository to contain the recent commits from master plus the commits from branch B, properly linked to rebased commit 5 (fifth
)?
Thanks for the advice!
Update
After looking around more, I determined that I can manually rebase the recent branches back into the newly rewritten tree. To do this, for each branch in the recent tree, I would do:
# Rebase branch B onto the newly rewritten fifth commit
git branch temp e8545fd # the SHA of the rewritten fifth commit
git checkout B
git rebase temp # This works, but will (always?) have conflicts because it starts
# from the beginning because there is no common merge base for the commit
git branch -d temp
So, this works, and produces the desired result. Bit the git rebase temp
produces a large number of merge conflicts (one for every commit since the beginning of the history), because the rewritten fifth commit does not share any history with the original branch B. So there's a lot of manual conflict resolution in here, and it would just take too long for my real repository. So still looking for a workable solution where the rebase works without merge conflicts.