TL;DR
Don't even try to do this. You can rebase most commits, but you should not attempt to rebase merge commits. The git rebase
command skips (omits) the merge commits, as you no doubt saw when you ran git rebase -i
.
You are safest if you rebase unpublished commits, which for most people in most workflows, means commits that you have not had git push
send upstream yet.
As a sort of general rule, rebasing makes it unnecessary to use merge commits, except perhaps for one final merge. This should all make more sense after reading the long answer below.
Long
You cannot rebase a merge commit.
This is a slight overstatement: there is a form of git rebase
, namely git rebase --preserve-merges
, that purports to preserve merge commits while doing a rebase. However, this claim that git rebase --preserve-merges
makes is itself a lie! It actually re-performs merges. It's tricky to use correctly.
To understand all of this properly, start with these Git concepts:
Every Git commit—really, every Git object—is immutable. Nothing can change anything about any commit. Each commit gets its own unique hash ID. (Git adds a time-stamp to each commit so that as long as time increases, you will still get a new, unique ID.1)
Most commits have exactly one parent commit. Each commit lists all its parents, however many there are, by their hash IDs.
A commit with more than one parent is a merge commit. (A commit with no parents is a root commit; typically there are very few of these, although the very first commit made must always be a root commit, so there has to be one!)
Git finds the last commit on a branch by reading the branch name. The name simply contains the actual hash ID of that final commit. Git then works backwards when necessary by using the last commit's parent(s), and those parents' parents, and so on.
Any commit can be copied, by extracting it, making some alteration as appropriate, and making a new commit. The new commit gets a new (different, unique-to-it) hash ID. We'll make use of this in a moment.
Writing a new commit to the current branch consists of the following process:
- Turn the current index into a snapshot. You can do this yourself at any time using
git write-tree
. This produces a tree object hash ID (which is not necessarily unique, since this snapshot might be the same as some other snapshot).
- Use the resulting tree hash ID, plus the metadata that goes into a commit—your name, your email address, the time stamp, etc.—to write a commit. The parent hash ID in this new commit is taken out of the current commit, as stored in the branch name. This is what the plumbing command
git commit-tree
does. Like git write-tree
, this produces a hash ID (unique this time).
- Write the new commit's hash ID into the current branch name.
1The granularity of the time stamp is in seconds, so it's technically possible to make the exact same commit twice, on two different branch names, within one second and get only one commit. If you do that—e.g., via script—you get just the one commit, with the one hash ID. The effect is essentially the same as git merge --ff-only
. Everything still works, but it's disconcerting!
The result of all of this is that for a simple linear chain of commits, we have a branch name—which we can draw at the right edge of the line—that points to (contains the hash ID of) the tip (last) commit of the branch. That commit points backwards to its parent: its predecessor commit, which at one point was the tip of the branch. The parent points back to its parent, and so on:
... <-parent <-tip <-- branch
Because the commits are immutable once made, only branch names change. These pointers move around all the time. The others are fixed once made, and always point backwards, so we can just draw them as lines, which is handy in text: it lets us draw branches like this:
...--F--G---H--I <-- master
\
J--K--L <-- dev
Using this, we can now see how git merge
works: we pick a branch, attach the word HEAD
to it (using git checkout
) so that Git knows which branch is the current one, and then run git merge
on the other name. Git finds the merge base commit—the point where the two branches rejoin, which in this case is commit G
—and, in effect, runs two separate git diff
commands:
git diff --find-renames <hash-of-G> <hash-of-I> # what we changed on master
git diff --find-renames <hash-of-G> <hash-of-L> # what they changed on dev
Git combines the two sets of changes, applying the combined changes to the snapshot stored in commit G
, and if that all works, Git makes a new merge commit that uses this combined-changes snapshot. The merge commit has two parents instead of just one. The first parent is the commit that was HEAD
, i.e., I
, and the second is the other commit we just named, i.e., L
:
...--F--G---H--I--M <-- master (HEAD)
\ /
J--K--L <-- dev
Note that the combining is smart: if we and they both made the same change(s) to the same line(s) of the same files, Git takes just one copy of those changes. If we made conflicting changes to the same lines, the merge stops in the middle, leaving us to clean up the mess. (We'll quietly pretend this never happens, for now. :-) )
Rebase copies commits, as if via repeated git cherry-pick
What git rebase
is all about, fundamentally, is copying some set of commits. That is, we'll do git checkout dev && git rebase master
and Git will copy some set of commits.
For instance, instead of making merge commit M
, what if we somehow got Git to copy the effect of commit J
, but applied to the snapshot associated with commit I
? That is, we want to turn the snapshot in J
into a set of changes, as compared to J
's parent commit G
:
git diff <hash-of-G> <hash-of-J> # what we did
If Git were then to combine those changes with the changes we made from G
up through I
, why then, we'd have just what we want.
Git can do this, and in fact, this copy one commit operation is available through the command git cherry-pick
. Note that this can be described a lot more simply as apply G
vs J
as a patch to I
, and in many cases this description is adequate (so you can carry it around in your head as an approximation), but in fact, Git does it the same way it does the change-combining of git merge
. This means that if commit I
already has some of the same work as G
-vs-J
, the copy is smart, just like git merge
: we get just one copy of the change, instead of two.
The final result, though, is an ordinary non-merge commit that is like J
, but different in two ways:
- It starts with whatever was in
I
, not with whatever was in G
(and omits any duplicated changes).
- It has, as its parent, commit
I
, not commit G
.
So let's call this new commit J'
, and draw it in. Git makes this new commit using Git's "detached HEAD" mode, where the special name HEAD
points directly to a commit, but you can think of this as Git using a temporarily-unnamed branch:
J' <-- HEAD
/
...--F--G--H--I <-- master
\
J--K--L <-- dev
Now that J
has been copied to J'
, git rebase
proceeds by copying commit K
to K'
, using the same core git cherry-pick
idea.2 This time the merge base is commit J
rather than commit G
, but if all goes well we don't really have to care about these details, we just see the copy completing and producing:
J'-K' <-- HEAD
/
...--F--G--H--I <-- master
\
J--K--L <-- dev
Finally, rebase copies L
to L'
, then executes its final trick: it peels the branch name dev
away from original commit L
, and makes it point to the last commit in the new chain, L'
. It re-attaches HEAD
at the same time, so that we have this:
J'-K'-L' <-- dev (HEAD)
/
...--F--G--H--I <-- master
\
J--K--L [abandoned]
The newly copied commits have new and different hash IDs, but serve the same purpose as the originals, and share their commit messages. Because Git does not display the abandoned original commits,3 it looks like the originals have mysteriously changed. In fact, though, they are still there and can be restored if desired; we just have the name dev
now locating the copied tip commit L'
instead of the original tip L
.
Since the copied commits come after master
, it's now trivial to use a fast-forward operation to incorporate those new commits without any actual merging. A fast-forward really means move the name forward, opposite the direction that the internal backwards commit arrows go. We can take this:
J'-K'-L' <-- dev
/
...--F--G--H--I <-- master
and just slide the name master
up-and-right so that it points to commit L'
too:
J'-K'-L' <-- dev, master
/
...--F--G--H--I
and it looks like we somehow managed to write all our commits in the best possible order. We only need an actual merge if we really want one; and to do that in plain Git, we have to run git merge --no-ff
.
Note: GitHub's clicky merge button runs git merge --no-ff
automatically, or runs git rebase
first and then does a fast-forward, or runs git merge --squash
, which we haven't covered here. This is all quite a bit different from command-line Git.
2For historical reasons, git rebase -i
actually uses git cherry-pick
, and some other git rebase
modes such as git rebase -m
do as well, but some git rebase
modes use git format-patch
piped to git apply
. This means that some rebases will fail to pick up on file renames, and can hit a few other corner cases. Probably rebase should default to cherry-pick style all the time, and only offer the patch-and-apply method with a backwards compatibility switch. But most of the time they work the same anyway.
3At this point, they are not truly abandoned. They can be found through two reflogs: one for the branch name dev
and one for HEAD
; and also via the special name ORIG_HEAD
. Within about 30 days, though, the reflog entries will expire, and something will have overwritten ORIG_HEAD
with some other previous branch-tip ID, and those commits will be truly abandoned and will be taken out by Git's garbage collector, git gc
.
Cherry-picking a merge commit is hard
To do a git cherry-pick
operation, Git had to look at the parent of the commit to be copied. An ordinary commit has only one parent, so this is easy: the parent is the parent. A merge commit, however, has two (or more but we're only concerned with two here). Which parent should git cherry-pick
use?
When you do a cherry-pick yourself, what Git does for these is that it forces you to pick one. For rebase, however, Git just omits the parents from the list of commits to copy.
What this means that if you already merged master
into dev
, as in this drawing (note that the merge M
is on dev
and not master
, and HEAD
is attached to dev
):
...--F--G---H----I <-- master
\ \
J--K--L--M <-- dev (HEAD)
you can still just run git rebase <options> master
. This has Git find the commits that are reachable4 from dev
—it's the current branch, to which HEAD
is attached—that are not reachable from master
, while throwing away merges. That list consists of the same commits as before: J
, K
, and L
!
If the rebase works, you get the same picture as before, with dev
pointing to L'
, which points back to K'
and J'
and then I
. Commit M
is no longer useful since the three copied commits start from the snapshot in I
.
Since the point of rebasing a simple, linear chain of commits is (normally) to put the whole chain after some other commit, it makes sense to throw out merges. Git can't copy them with a simple git cherry-pick
, and it won't need to anyway. But there are some cases where you might like to keep some merge commits.
4For a good definition of reachable, with a digestible dose of graph theory, see Think Like (a) Git.
git rebase --preserve-merges
For the special case of taking several non-linear chains (with embedded merges) and copying them, Git does have the --preserve-merges
or -p
option. However, this does not actually preserve merges. What it does—through a hack that really isn't quite right—is to generate an internal script that remembers where various merges were, then use the git rebase -i
machinery to copy commits, stopping whenever it would have had to copy a merge.
At these points, instead of attempting to copy the merge commit, Git just runs a new git merge
. Unfortunately, this new merge does not know what options were used for the original git merge
. If you did in fact use options (e.g., -s ours
, -X ours
, or --find-renames=20
), Git fails to use those same options and the merge may well go awry (of course it might go awry just like any merge). Using git rerere
can get one past some sticking points here, but in general this is quite tricky. You must carefully check the results of any re-performed merges.
This will be much improved in Git 2.18, although I have not looked at the details yet (and I suspect there is still no provision for remembering merge options: that requires auxiliary data Git could save, but does not currently save anywhere).
Summary
This is not everything that you can do with rebase (we have not touched on --onto
nor on the various commands you can do in an interactive rebase), but it covers the key elements:
- Rebase involves copying some set of commits, as if by
git cherry-pick
.
- These omit merge commits.
- To see which commits will be copied, to where, draw the graph! (Or use
git log --graph
, perhaps with --oneline
and additional options, or use a viewer like gitk
or one of the GUIs that draws the graph.)
- After copying, you only need a merge if you really want one. The copied commits generally come after the tip of the target branch, making everything ready for a fast-forward instead.
It's also important to remember that GitHub works differently from Git. The clicky web button can do three different things, and only one of them is git merge
(and even then it's git merge --no-ff
!).