Git: Rebase Last 5 Commits When 2 Are Merges

Question

I've made a feature branch where the last entries in my git log are 3 commits and 2 merges.

The merges look like:

Merge branch 'my_feature_branch' of my_repo into 'my_feature_branch'

and

Merge branch 'master' of my_repo into 'my_feature_branch'

Normally, if I had 5 commits and wanted to turn them into one, I would just do git rebase -i HEAD~5. However, when I try this, it tells me it couldn't apply the very first commit (not merge) listed in git log.

Any idea what's happening here?

I'd just like to neaten these 3 commits and 2 merges into a single commit to make a neat PR.

Thanks!

I guess I don't fully understand the part about *"Merge branch 'my_feature_branch' of my_repo into 'my_feature_branch'"*... why do you merge from `my_feature_branch` into `my_feature_branch`? Anyway, what about `git merge --squash`? I think rebase is not really your scenario. https://stackoverflow.com/questions/5308816/how-to-use-git-merge-squash — grek40, Aug 13 '18 at 05:11

torek · Answer 1 · 2018-08-13T07:49:38.677

TL;DR

Don't even try to do this. You can rebase most commits, but you should not attempt to rebase merge commits. The git rebase command skips (omits) the merge commits, as you no doubt saw when you ran git rebase -i.

You are safest if you rebase unpublished commits, which for most people in most workflows, means commits that you have not had git push send upstream yet.

As a sort of general rule, rebasing makes it unnecessary to use merge commits, except perhaps for one final merge. This should all make more sense after reading the long answer below.

Long

You cannot rebase a merge commit.

This is a slight overstatement: there is a form of git rebase, namely git rebase --preserve-merges, that purports to preserve merge commits while doing a rebase. However, this claim that git rebase --preserve-merges makes is itself a lie! It actually re-performs merges. It's tricky to use correctly.

To understand all of this properly, start with these Git concepts:

Every Git commit—really, every Git object—is immutable. Nothing can change anything about any commit. Each commit gets its own unique hash ID. (Git adds a time-stamp to each commit so that as long as time increases, you will still get a new, unique ID.¹)
Most commits have exactly one parent commit. Each commit lists all its parents, however many there are, by their hash IDs.
A commit with more than one parent is a merge commit. (A commit with no parents is a root commit; typically there are very few of these, although the very first commit made must always be a root commit, so there has to be one!)
Git finds the last commit on a branch by reading the branch name. The name simply contains the actual hash ID of that final commit. Git then works backwards when necessary by using the last commit's parent(s), and those parents' parents, and so on.
Any commit can be copied, by extracting it, making some alteration as appropriate, and making a new commit. The new commit gets a new (different, unique-to-it) hash ID. We'll make use of this in a moment.
Writing a new commit to the current branch consists of the following process:
1. Turn the current index into a snapshot. You can do this yourself at any time using git write-tree. This produces a tree object hash ID (which is not necessarily unique, since this snapshot might be the same as some other snapshot).
2. Use the resulting tree hash ID, plus the metadata that goes into a commit—your name, your email address, the time stamp, etc.—to write a commit. The parent hash ID in this new commit is taken out of the current commit, as stored in the branch name. This is what the plumbing command git commit-tree does. Like git write-tree, this produces a hash ID (unique this time).
3. Write the new commit's hash ID into the current branch name.

¹The granularity of the time stamp is in seconds, so it's technically possible to make the exact same commit twice, on two different branch names, within one second and get only one commit. If you do that—e.g., via script—you get just the one commit, with the one hash ID. The effect is essentially the same as git merge --ff-only. Everything still works, but it's disconcerting!

The result of all of this is that for a simple linear chain of commits, we have a branch name—which we can draw at the right edge of the line—that points to (contains the hash ID of) the tip (last) commit of the branch. That commit points backwards to its parent: its predecessor commit, which at one point was the tip of the branch. The parent points back to its parent, and so on:

... <-parent <-tip   <-- branch

Because the commits are immutable once made, only branch names change. These pointers move around all the time. The others are fixed once made, and always point backwards, so we can just draw them as lines, which is handy in text: it lets us draw branches like this:

...--F--G---H--I   <-- master
         \
          J--K--L   <-- dev

Using this, we can now see how git merge works: we pick a branch, attach the word HEAD to it (using git checkout) so that Git knows which branch is the current one, and then run git merge on the other name. Git finds the merge base commit—the point where the two branches rejoin, which in this case is commit G—and, in effect, runs two separate git diff commands:

git diff --find-renames <hash-of-G> <hash-of-I>   # what we changed on master
git diff --find-renames <hash-of-G> <hash-of-L>   # what they changed on dev

Git combines the two sets of changes, applying the combined changes to the snapshot stored in commit G, and if that all works, Git makes a new merge commit that uses this combined-changes snapshot. The merge commit has two parents instead of just one. The first parent is the commit that was HEAD, i.e., I, and the second is the other commit we just named, i.e., L:

...--F--G---H--I--M   <-- master (HEAD)
         \       /
          J--K--L   <-- dev

Note that the combining is smart: if we and they both made the same change(s) to the same line(s) of the same files, Git takes just one copy of those changes. If we made conflicting changes to the same lines, the merge stops in the middle, leaving us to clean up the mess. (We'll quietly pretend this never happens, for now. :-) )

Rebase copies commits, as if via repeated `git cherry-pick`

What git rebase is all about, fundamentally, is copying some set of commits. That is, we'll do git checkout dev && git rebase master and Git will copy some set of commits.

For instance, instead of making merge commit M, what if we somehow got Git to copy the effect of commit J, but applied to the snapshot associated with commit I? That is, we want to turn the snapshot in J into a set of changes, as compared to J's parent commit G:

git diff <hash-of-G> <hash-of-J>   # what we did

If Git were then to combine those changes with the changes we made from G up through I, why then, we'd have just what we want.

Git can do this, and in fact, this copy one commit operation is available through the command git cherry-pick. Note that this can be described a lot more simply as apply G vs J as a patch to I, and in many cases this description is adequate (so you can carry it around in your head as an approximation), but in fact, Git does it the same way it does the change-combining of git merge. This means that if commit I already has some of the same work as G-vs-J, the copy is smart, just like git merge: we get just one copy of the change, instead of two.

The final result, though, is an ordinary non-merge commit that is like J, but different in two ways:

It starts with whatever was in I, not with whatever was in G (and omits any duplicated changes).
It has, as its parent, commit I, not commit G.

So let's call this new commit J', and draw it in. Git makes this new commit using Git's "detached HEAD" mode, where the special name HEAD points directly to a commit, but you can think of this as Git using a temporarily-unnamed branch:

                J'  <-- HEAD
               /
...--F--G--H--I   <-- master
         \
          J--K--L   <-- dev

Now that J has been copied to J', git rebase proceeds by copying commit K to K', using the same core git cherry-pick idea.² This time the merge base is commit J rather than commit G, but if all goes well we don't really have to care about these details, we just see the copy completing and producing:

                J'-K'  <-- HEAD
               /
...--F--G--H--I   <-- master
         \
          J--K--L   <-- dev

Finally, rebase copies L to L', then executes its final trick: it peels the branch name dev away from original commit L, and makes it point to the last commit in the new chain, L'. It re-attaches HEAD at the same time, so that we have this:

                J'-K'-L'  <-- dev (HEAD)
               /
...--F--G--H--I   <-- master
         \
          J--K--L   [abandoned]

The newly copied commits have new and different hash IDs, but serve the same purpose as the originals, and share their commit messages. Because Git does not display the abandoned original commits,³ it looks like the originals have mysteriously changed. In fact, though, they are still there and can be restored if desired; we just have the name dev now locating the copied tip commit L' instead of the original tip L.

Since the copied commits come after master, it's now trivial to use a fast-forward operation to incorporate those new commits without any actual merging. A fast-forward really means move the name forward, opposite the direction that the internal backwards commit arrows go. We can take this:

                J'-K'-L'  <-- dev
               /
...--F--G--H--I   <-- master

and just slide the name master up-and-right so that it points to commit L' too:

                J'-K'-L'  <-- dev, master
               /
...--F--G--H--I

and it looks like we somehow managed to write all our commits in the best possible order. We only need an actual merge if we really want one; and to do that in plain Git, we have to run git merge --no-ff.

Note: GitHub's clicky merge button runs git merge --no-ff automatically, or runs git rebase first and then does a fast-forward, or runs git merge --squash, which we haven't covered here. This is all quite a bit different from command-line Git.

²For historical reasons, git rebase -i actually uses git cherry-pick, and some other git rebase modes such as git rebase -m do as well, but some git rebase modes use git format-patch piped to git apply. This means that some rebases will fail to pick up on file renames, and can hit a few other corner cases. Probably rebase should default to cherry-pick style all the time, and only offer the patch-and-apply method with a backwards compatibility switch. But most of the time they work the same anyway.

³At this point, they are not truly abandoned. They can be found through two reflogs: one for the branch name dev and one for HEAD; and also via the special name ORIG_HEAD. Within about 30 days, though, the reflog entries will expire, and something will have overwritten ORIG_HEAD with some other previous branch-tip ID, and those commits will be truly abandoned and will be taken out by Git's garbage collector, git gc.

Cherry-picking a merge commit is hard

To do a git cherry-pick operation, Git had to look at the parent of the commit to be copied. An ordinary commit has only one parent, so this is easy: the parent is the parent. A merge commit, however, has two (or more but we're only concerned with two here). Which parent should git cherry-pick use?

When you do a cherry-pick yourself, what Git does for these is that it forces you to pick one. For rebase, however, Git just omits the parents from the list of commits to copy.

What this means that if you already merged master into dev, as in this drawing (note that the merge M is on dev and not master, and HEAD is attached to dev):

...--F--G---H----I   <-- master
         \        \
          J--K--L--M   <-- dev (HEAD)

you can still just run git rebase <options> master. This has Git find the commits that are reachable⁴ from dev—it's the current branch, to which HEAD is attached—that are not reachable from master, while throwing away merges. That list consists of the same commits as before: J, K, and L!

If the rebase works, you get the same picture as before, with dev pointing to L', which points back to K' and J' and then I. Commit M is no longer useful since the three copied commits start from the snapshot in I.

Since the point of rebasing a simple, linear chain of commits is (normally) to put the whole chain after some other commit, it makes sense to throw out merges. Git can't copy them with a simple git cherry-pick, and it won't need to anyway. But there are some cases where you might like to keep some merge commits.

⁴For a good definition of reachable, with a digestible dose of graph theory, see Think Like (a) Git.

`git rebase --preserve-merges`

For the special case of taking several non-linear chains (with embedded merges) and copying them, Git does have the --preserve-merges or -p option. However, this does not actually preserve merges. What it does—through a hack that really isn't quite right—is to generate an internal script that remembers where various merges were, then use the git rebase -i machinery to copy commits, stopping whenever it would have had to copy a merge.

At these points, instead of attempting to copy the merge commit, Git just runs a new git merge. Unfortunately, this new merge does not know what options were used for the original git merge. If you did in fact use options (e.g., -s ours, -X ours, or --find-renames=20), Git fails to use those same options and the merge may well go awry (of course it might go awry just like any merge). Using git rerere can get one past some sticking points here, but in general this is quite tricky. You must carefully check the results of any re-performed merges.

This will be much improved in Git 2.18, although I have not looked at the details yet (and I suspect there is still no provision for remembering merge options: that requires auxiliary data Git could save, but does not currently save anywhere).

Summary

This is not everything that you can do with rebase (we have not touched on --onto nor on the various commands you can do in an interactive rebase), but it covers the key elements:

Rebase involves copying some set of commits, as if by git cherry-pick.
These omit merge commits.
To see which commits will be copied, to where, draw the graph! (Or use git log --graph, perhaps with --oneline and additional options, or use a viewer like gitk or one of the GUIs that draws the graph.)
After copying, you only need a merge if you really want one. The copied commits generally come after the tip of the target branch, making everything ready for a fast-forward instead.

It's also important to remember that GitHub works differently from Git. The clicky web button can do three different things, and only one of them is git merge (and even then it's git merge --no-ff!).

"You cannot rebase a merge commit." Well... actually, even better than the `-p` (`--preserve-merges`) option, you now (Git 2.18) have the `--rebase-merges` option! (https://stackoverflow.com/a/50555740/6309) — VonC, Aug 13 '18 at 07:45
@VonC: Ah, yes, I should update! (I don't have 2.18 installed anywhere yet) — torek, Aug 13 '18 at 07:47

score 0 · Answer 2 · answered Aug 20 '18 at 14:01

To achieve what you want to do, reset your feature branch to master, and commit all your changes again as a single commit:

git checkout my_feature_branch
git branch backup
git reset master
git add ... # all your changes
git commit -m '...' # write a nice message
git diff backup # should not show any difference
git branch -D backup
git push origin my_feature_branch --force