You are essentially correct. Rebase is, however, not a cure-all for the problem.
Let's draw a slightly more complex situation. We'll start with a similar commit graph:
...--o--o
The final o
node here is the tip commit of some branch, with some earlier commit o
also not very distinguished. We won't bother with branch labels because they are just aimed at helping humans, and we're looking at what Git does, rather than what humans do. :-)
Along comes one human who makes a new commit, your commit C1
(I'll just call it C
though) that has a bug in it:
C
/
...--o--o
Meanwhile, in this repo, or in some other clone of it, along comes a different human who makes an unrelated commit F
:
C
/
...--o--*
\
F
Commits C
and F
are probably on different (named) branches, but the important thing is that they have a common base *
(this used to be just marked o
but now we need to remember that it's the common base—though it's pretty obvious from the drawing).
In your particular scenario, the second user made F
by cherry-picking C
. Let's say that in our case, the second user made F
quite independently. Now that second user decides that now is the time to cherry-pick C
, so they get a copy of it—but it does not apply cleanly, so they change it slightly—hand-edit it—so that it applies. Now they have:
C
/
...--o--*
\
F--G
Note, again, that commit G
is mostly, but not quite, a copy of C
—which, as we noted, is about to be deemed defective.
Your first human therefore reverts C
to, in effect, remove it from his branch, then adds D
(the corrected fix):
C--R--D
/
...--o--*
\
F--G
Your second human goes on to add more commits:
C--R--D <-- branch1
/
...--o--*
\
F--G--H--I <-- branch2
(this time I've put in the branch names too).
When rebase works and when it fails
What git rebase
does is, in essence, find commits that are in common between the two branches, and that are exclusive to each of the two branches. Your second human will come along and try to rebase the F-G-H-I
sequence atop D
.
The common commits start from the merge base *
and work backwards; rebase gets to ignore these entirely.
The commits to be copied start after the merge base and end with the tip-most commit, hence are F
, G
, H
, and I
.
But, before copying these, Git checks the commits exclusive to the "other side": commits after the merge base *
that end with D
. These are C
(the bad commit), R
(the revert of C
), and D
. It uses git patch-id
on each of those commits, and also on all the commits set to be copied. If the patch ID of one of the "to be copied" commits matches the patch ID of one of the "already in the chain ending with D
" commits, Git drops that commit.
This is how, when commit G
is an exact (not-hand-edited) copy of C
, Git can drop G
and just copy F
, H
, and I
. The exact copy winds up with the same patch-ID. But this G
was hand-edited to make it fit, which changed its patch-ID. Rebase therefore copies G
, giving:
C--R--D <-- branch1
/ \
...--o--* F'-G'-H'-I' <-- branch2
\
F--G--H--I [abandoned]
So, while git merge
definitely fails, git rebase
sometimes also fails (specifically when a cherry-picked commit had to be modified to fit). In this case, that happened because of a conflict between F
and the cherry-picked C
, but there are plenty of ways to run into this.
Is there other way to avoid (or fix) the problem I mentioned here?
Ideally, instead of cherry-picking C
in the first place, whoever is working on branch2
would just rebase onto C
at that time, and then rebase onto R
again later if needed (or just straight onto D
), or merge after said rebase. Let's see what the graph looks like if the second human, working on branch2
, had rebased his F
commit onto C
instead of cherry-picking. Let's draw the before-rebase:
C <-- branch1
/
...--o--*
\
F <-- branch2
and move C
down a few lines, which is exactly the same commits, just drawn more linearly:
...--o--*---C <-- branch1
\
F <-- branch2
and now let's copy F
to F'
atop C
and move the branch label:
...--o--*---C <-- branch1
\ \
\ F' <-- branch2
\
F [abandoned]
The merge base of C
and F'
is now C
itself, rather than commit *
. Let's put the remaining commits in, unmarking the *
commit and dropping abandoned commits:
...--o--o---C--R--D <-- branch1
\
F'-H--I <-- branch2
If we now use git merge
to merge commit I
atop commit D
, we won't re-introduce bad commit C
via G
, since there now is no G
.
Of course, if multiple people are using branch2
—if the old F
commit is published—this rebase-makes-a-copy thing means they must all switch to using the new copies, every time we rebase.
Testing
Is there other way to avoid (or fix) the problem I mentioned here?
Ideally, when someone found a bug, before writing commit C
at all, they wrote a test case. The test case showed that commit C
was required and that commit C
fixed the bug, which is why commit C
was committed in the first place.
When C
was found to be faulty, the test case for it should have been improved, or an additional test case written, demonstrating that commit C
was not quite right. This also is why revert R
went in, and subsequent better fix D
. (Perhaps D
was, in essence, a squash of R
and the replacement fix—though the fact that C
got copied suggests that R
should exist as a stand-alone reversion.)
These tests will now show the problem if a rebase or merge re-introduces a slight variation of commit C
, such as our hypothetical commit G
. That won't avoid or fix the problem itself, but will at least catch it right away.