How do you keep branches sharing the same history through rebases?

Question

If I start with this history:

* 505421e - (HEAD -> stage4) go to stage 4
* 307978f - (stage3) go to stage 3
* d8ab213 - steb b
* 49cbef9 - (stage2) step a
* 6e4a2ed - (stage1) go to stage 1
* 3ca2d7d - do something
* 5ce596d - (stage0) go to stage 0
* 0c8487b - foo
* ccfb7c9 - bar

And I do git rebase -i ccfb7c9 in order to change the foo commit, then branches stage0–stage3 will no longer have the same commit history as stage4, and will not have the updated foo commit. How do I get them to have the same history?

Note in general that preserving the structure is impossible: what if you squashed 3ca2 into 5ce5, for example? Where should `stage0` point after that? — Davis Herring, Dec 17 '17 at 01:15
@DavisHerring: I think the logically-correct answer is to handle this the same way `git filter-branch` does: make a map of old=>new commit hashes, and replace labels based on the map entries. Squashing two commits means that both old commits map to one new commit. Splitting a commit maps the old commit to the first of the new commits. The existing rebase code doesn't help making these maps though. — torek, Dec 17 '17 at 17:47

score 1 · Answer 1 · answered Dec 17 '17 at 00:34

You are correct. Unfortunately, there is nothing built in to Git to do this the "right way". It's somewhat difficult to define "right" here, although I have a particular definition in mind¹ and started writing a program to do it at one point. It got too difficult for all the weird corner cases, and I abandoned it.

The fundamental problem here is that git rebase works by copying commits, as if by git cherry-pick (with interactive rebase letting you make some additional changes along the way).² The new copies have new, different hash IDs.³ Once the copies are made, Git re-points one branch name—the current branch, whatever that was when you started the git rebase—to the last such copied commit.

In other words, in this case, you have Git copy:

0c8487b - foo
5ce596d - (stage0) go to stage 0
3ca2d7d - do something
...
505421e - (HEAD -> stage4) go to stage 4

in that order (earliest to latest), one at a time. At the first step you make some change—it doesn't really matter what, as long as you change something—so that the new commit gets some new, different hash ID, such as cccccc1 (probably not actually that, but this lets us refer to it as "new commit 1"). The parent commit of this new commit is ccfb7c9 which is the commit labeled bar, so the new history rejoins the old history at that point.

Then Git copies the second commit, 5ce596d - (stage0) go to stage 0, which becomes, say, cccccc2. The parent of cccccc2 is cccccc1, which alone is sufficiently different from 5ce596d to force this to be a different commit. Git goes on to copy all eight commits, with the last one becoming cccccc8 perhaps; and then Git changes the name stage4 so that it names commit cccccc8.

Thus, when you now look at the history by starting from the commit that Git finds by the name stage4, you see the new history. But Git hasn't changed any of the other names: stage3, for instance, still identifies commit 307978f, while stage0 still identifies commit 5ce596d. So if you look at your history by starting from any of those names, you see the original series of commits.

What you need is to get Git to move each label from its original hash to its new hash. The problem comes about with identifying all of these things: which labels should move? (You might want some to retain the old commits on purpose, and some to move.) For that matter, which commit is the correct new commit? What if, during the interactive rebase, I choose to split some commits and combine others?

The simple and brute-force solution is to manually force any names you want changed, to point to the new commits. Run git log --all --decorate --oneline --graph and note down the old and new commit IDs for each name, then run:

git branch -f stage0 newhash0
git branch -f stage1 newhash1
git branch -f stage2 newhash2
git branch -f stage3 newhash3

and you're done. Or, just delete some of those branch names entirely, since you can find the commits by starting from stage4 (which moved automatically) and working backwards.

¹The definition I like involves structuring branch names, so that a name has more semantics than just "raw pointer to a commit". The form of this structuring is difficult as well: should it use a name hierarchy, or should we have git config entries to groups branch names into "superbranches"?

²Some git rebase commands literally run git cherry-pick and some do not. Interactive rebase is one of the cases that really does run git cherry-pick, along with git commit --amend and other tricky items.

³The hash ID of any Git object is strictly determined by its contents, so if the "copy" is 100% identical to the original—bit for bit identical—you wind up with the same hash ID, i.e., you simply re-use the original. But as soon as you make any change at all, the hash ID of some commit in the sequence changes, which forces every "downstream" (child) commit to have a different parent hash ID, which changes every downstream commit as well.

How do you keep branches sharing the same history through rebases?

1 Answers1