There are those who argue that you should keep all the merges: this represents the actual development process. The method you're using, of "cleaning up" the set of commits so that the history is prettier and reflects an idealized development process, is supposed to make things easier for future people (perhaps even for future you) when you need to fix bugs that are in those future-people's past, which could be your own future, at the moment.
All of this is a fair amount of work, and the payoff is uncertain. And yet I myself prefer this "pre-clean the history" method, when I can afford it.
Now, let's add this as well:
git rebase --strategy-option=ours feature --onto master <divergent_ref>
You must be very careful with -X ours
: Git is following simple text rules, which won't work in all situations. (If you have automated tests, consider adding --exec
to run them.)
Typically in this situation, I just end up remembering and repeating each conflict resolution. But surely there's a better way?
You can use git rerere
here, sometimes. Mostly --onto
is what you need (see Schwern's answer).
There are a few things to know before you start on this path:
First, remember that git rebase
essentially means copy some commits, as if by git cherry-pick
. Some rebase operations really do use git cherry-pick
. One—the oldest form of rebase—uses git format-patch
and git am
. This does not work as well in general, but does run faster. Because it doesn't use git cherry-pick
, I think git rerere
won't apply either. But adding --strategy-option
forces the cherry-pick variant; so does adding -m
, or -i
or --exec
.
A cherry-pick is a merge!
Having listed out some commit hash IDs, rebase begins its real work by detaching HEAD
at the commit onto which the copies are to be appended. It then makes the copies. After the selected commits have been copied, the rebase operation writes the hash ID of the last-copied, or HEAD
, commit, into the branch name of the branch you were on when you started the whole thing.
But we also have to look at the set of commits that are to be copied. The first sentence above says having listed out some commit hash IDs. Which hash IDs? This is where --onto
comes in.
A plain git rebase
gives you one control knob, which the git rebase
documentation calls upstream
. This one control knob both selects commits to copy and where to put the copies. Using git rebase --onto
gives you a separate knob for where to put the copies, so that frees up the upstream
argument to let you select what-to-copy more carefully.
The upstream
argument is normally both where to put the copies and what not to copy, with the what to copy list being determined by the result of git rev-list upstream..HEAD
, more or less. We'll see this in action below. But more or less is important here: the rebase documentation cites this A..B
notation as how it determines what to copy and what not to copy, but it's actually less than that. Again, we'll see more about this below.
How to express what I think your actual problem here is
In any case, let's draw what I think is your actual problem-over-time as several different snapshots in time:
E--F--G <-- feature1
/
...--o--*--o--o--*--o--o <-- mainline
\
A--B--C--D <-- feature2
This is how it all starts: there's some project with some main-line going on, and two features being developed. But now it turns out that feature2
depends on something in feature1
. So now you'd like to rebase—i.e., copy some commits from—feature2
so that they appear in feature1
. At this point, this rebase is easy to invoke:
git checkout feature2; git rebase feature1
already selects the correct commits and the correct copy-point. If this copying is done via cherry-pick, each copy is done by the merge machinery, which includes saving conflict resolutions with git rerere
, if rerere.enabled
is set. (If it's not set, Git saves neither the conflicts, nor their resolutions.)
The end result is:
A'-B'-C'-D' <-- feature2
/
E--F--G <-- feature1
/
...--o--*--o--o--*--o--o <-- mainline
\
A--B--C--D [abandoned]
The new commit chain, A'-B'-C'-D'
, looks a lot like your original chain, but the hash IDs differ. Since no one actually looks at the hash IDs, and the original A-B-C-D
chain is now invisible to normal git log
operations, nobody ever really pays attention to the changeover—but it's real. And it is about to bite you in another way.
Now that you have rebased your feature2
atop feature1
, someone else (or maybe even you) rebases feature1
. The result is:
A'-B'-C'-D' <-- feature2
/
E--F--G
/
...--o--*--o--o--*--o--o <-- mainline
\
E'-F'-G' <-- feature1
(I stopped drawing in A-B-C-D
as they're not really useful any more.) Note how E-F-G
are supposed to be abandoned, but actually are not.
A plain git checkout feature2; git rebase feature1
will choose to copy commits E-F-G-A'-B'-C'-D'
. Using --onto
, you can run git rebase --onto feature1 hash-of-commit-G
to tell
git rebase*do not copy commits E-F-G
.
But in fact, there's a handy feature inside git rebase
: it automatically excludes some commits from its list of commits. I already mentioned this above, in the more or less ... actually less part. The rebase documentation actually says this:
[The commits to be copied are] the same set of
commits that would be shown by git log <upstream>..HEAD
; or by
git log 'fork_point'..HEAD
, if --fork-point
is active
(see the description on --fork-point
below); or by git log HEAD
,
if the --root
option is specified.
But this is not true! By default, git rebase
also excludes:
- any merge commit, and
- any commit in
upstream..HEAD
for which its git patch-id
matches the patch-ID of a commit that is in HEAD..upstream
.
Suppose that whoever copied E-F-G
to E'-F'-G'
didn't have to resolve any merge conflicts or anything. In this case, the copies, E'
through G'
, will have the same git patch-id
as their corresponding original. So git rebase
would drop those commits even without --onto
.
(The documentation also mentions fork-point, which uses your own Git's reflog for the upstream
branch to pick a --onto
value if you didn't. However, fork-point mode is (1) not always active and (2) not always right when it is active. I don't really like the fork-point selection trick myself: I think it buries too much magic. Also, since it uses reflogs, it fails if the crucial reflog entry has expired. But that's all an aside anyway.)
Where this particular sequence all goes wrong is when whoever was copying feature1
had to modify one of their copied commits such that the patch-ID trick fails. In this case, using --onto
is the way to go: it fixes the problem without any additional mess.
But you may have another problem. In particular, suppose that while you work on feature2
and someone else works on feature1
, they realize the same thing you did, or hear about or see one of your commits' changes, and they add a new commit that partly, but not completely, fixes something you're doing, but in a different way? Then maybe they have:
A'-B'-C'-D' <-- feature2
/
E--F--G
/
...--o--*--o--o--*--o--o <-- mainline
\
H-E'-F'-G' <-- feature1
where H
is more similar to one of your original A-B-C-D
commits than your updated A'-B'-C'
commits. In this case, you might want to bring your A-B-C-D
series back. Commit D
is almost certainly still in your Git, remembered as feature2@{number}
. (The exact number depends on how many updates you have made to feature
since then.) Or, of course, you can do what I do, which is to save the original feature2
pointer by creating feature2.0
, feature2.1
, and so on. Let's draw it back in, as feature2.0
, and rename feature
to feature2.1
:
A'-B'-C'-D' <-- feature2.1
/
E--F--G
/
...--o--*--o--o--*--o--o <-- mainline
\ \
\ H-E'-F'-G' <-- feature1
\
A--B--C--D <-- feature2.0
If H
is very close to one of your original four commits—so that it either has the same patch ID, or that you can use drop
in an interactive rebase—you might want to use it as the source. If you had to resolve some conflicts earlier, git rerere
will do that. We can now do:
git checkout -b feature2 feature2.0
git rebase -i feature1
(the interactive rebase allows doing a "drop", and forces cherry-picking; if you like, use git rebase -m
to force cherry-picking without interactivity). If all goes well, and assuming H
==C
as it were, we end up with:
A'-B'-C'-D' <-- feature2.1
/
E--F--G
/
...--o--*--o--o--*--o--o mai... A"-B"-D" <-- feature2
\ \ /
\ H-E'-F'-G' <-- feature1
\
A--B--C--D <-- feature2.0
The rerere
was perhaps useful in terms of saving a merge conflict resolution for D
, and the auto-patch-ID-detection may have ejected commit C
for you here.
Viewing rebase as a series of cherry-picks
A normal merge works by finding a merge base—a common, shared commit between two branches—and doing two diffs. The diff from merge base to each branch tip tells us who changed what:
I--J <-- ours (HEAD)
/
...--G--H
\
K--L <-- theirs
The comparison of commit H
's snapshot vs J
's tells us what we changed, on branch ours
; comparing H
vs L
tells us what they changed; and git merge
combines the changes. In cases of conflicts, the (eXtended) strategy-option -X ours
or -X theirs
tells Git to resolve the conflict automatically by choosing H
-vs-J
("ours") or H
-vs-L
("theirs").
For a normal merge, the final commit after resolving everything is a merge commit, with parents J
and L
, in that order (ours first, then theirs).
A cherry-pick takes a normal merge and subverts it. Instead of finding a common commit, the merge base is simply the to-be-picked commit's parent commit.
When we're rebasing and copying that first commit, this makes sense:
...--o--o--*--o--H <-- mainline, HEAD (detached)
\
A--B--C <-- feature
We are now copying commit A
. Here "ours" is H
: the tip commit of mainline
, to which HEAD
points directly (detached). The pseudo merge base is commit *
: the point at which A
first diverged. So we'll diff *
vs H
to see what "we" changed, and *
vs A
to see what "they" changed. Then we'll combine these differences: that gets us H
back, plus whatever we did in A
. We'll commit the result and make A'
, the copy of A
:
A' <-- HEAD
/
...--o--o--*--o--H <-- mainline
\
A--B--C <-- feature
(The final commit of a cherry-pick is an ordinary, non-merge commit.)
But now we'll copy B
. Its parent is A
, so we will use A
for the merge base. We'll diff A
vs A'
to see what "we" changed, and A
vs B
to see what "they" changed. If we blindly take "ours"—i.e., A
vs A'
—in a conflict, we could lose important changes from B
. Maybe we don't need them—maybe *
-vs-H
already contained them. But maybe we do.
In any case, when this is all done, we end up with:
A'-B' <-- HEAD
/
...--o--o--*--o--H <-- mainline
\
A--B--C <-- feature
and we are ready to cherry-pick C
as before. When that's done, Git will yank the name feature
off C
and make it point to C'
instead (and re-attach HEAD
):
A'-B'-C' <-- feature (HEAD)
/
...--o--o--*--o--H <-- mainline
\
A--B--C [abandoned]
and that's our rebase.
Since all this is theoretical anyway, let's just draw conclusions now
The real points to remember here are:
- rebase copies (some) commits, as if by, or actually by,
git cherry-pick
;
- the real key to minimizing work is to pick the correct commits to copy;
- sometimes this means
--onto
, and sometimes it might even mean going back to an earlier copy of your own work.
(I don't actually use git rerere
myself and am not sure whether cherry-pick uses it automatically. If not, you can use it manually.)