git rebase is not behaving as expected

Question

So I wanted to update my codebase from an upstream/master to "8".

While being in 8, I did:

Then I resolved the conflicts in Webstorms rebase/merge dialog.

(fyi, here is video proof)

Now it looks like this:

Being in the middle of a rebase, I'm inside the upstream branch, which is normal I guess.

Now, I went ahead with:

git rebase --continue

This results in:

Now I'm really confused!

rebase 8 onto onto 41b65bf?

(I was in 8 when I typed git rebase upstream/master and then in 41b65bf directly after that.

So I assume 41b65bf is an intermediate branch for the rebase operation.)

(41b65bf8 is the revision number for the last commit of the upstream branch.)

The only commit shown in the Vim interface(which I don't appreciate being forced on me) is "init", which was the last commit of origin/8.06

I really really hope someone can explain what branch origin/8.06 even has to do with this operation. And how I could proceed.

Edit to rephrase my problem more concisely:

I tried to rebase the 'upstream/master' branch into the '8' branch. Via 'git rebase upstream/master'. While being inside of (8). If you look at the very first image, you see the state directly before that. Why wouldn't just the changes from upstream be committed on top of the head? Why does it say '8 on 41b65f'? That sounds like the opposite of what I want.

I'm reading your question, and I have a hard time understanding what you are after. Can you please state in a clearer sentence what you are looking for ? — LeGEC, Oct 06 '21 at 12:07
When a `git rebase` hits a conflict, it stops to let you fix the conflicts, then, when you run `git rebase --continue`, it opens an editor to let you edit the commit message it will record this commit. This is why `vim` opens, with a file which states : a) the message of the original commit (in your case : the `"init"` commit), b) commented lines, which hopefully give you a bit of insight on the context of that commit to come. — LeGEC, Oct 06 '21 at 12:10
You've answered your own question: "So I assume 41b65bf is an intermediate branch for the rebase operation.) (41b65bf8 is the revision number for the last commit of the upstream branch.)" Which is just what you asked to rebase onto. — matt, Oct 06 '21 at 12:11
In those comments, it states that you are in the middle of a rebase which started as `rebase '8' onto '41b65bf'`, that the action that was applied when you hit the conflict was `pick 137d8a0 init` (e.g : apply the `"init"` commit), that there are still 15 commands to come, and the next two ones are `pick ... pick ...`, etc ... — LeGEC, Oct 06 '21 at 12:12
That's for the context. As for what you *want*, well, it depends on what's inside those commits. From what I see, your local `8` branch has a set of commits : `init; commit after pull ...; commit after pull ...; resolve merge` which seem to exist only because this branch started with its own initial commit ; do these commit have relevant content ? — LeGEC, Oct 06 '21 at 12:19
I was asking to rebase the 'upstream/master' branch into the '8' branch. via 'git rebase upstream/master' If you look at the very first image, you see the state directly before that. Why wouldn't just the changes from upstream be committed on top of the head? Why does it say 8 on 41b65f? That sounds like the opposite of what I want. To answer the question @LeGEC: it would be convenient, if the grey commits from inside 8 would be summed up into one. — J0hannes, Oct 06 '21 at 14:25
It looks like you are using `git rebase` in the opposite direction. Double check a tutorial to use `git rebase`, for example [atlassian's](https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase) — LeGEC, Oct 06 '21 at 15:22
(post editet) so I should have been inside of (upstream/master) before entering the rebase command? I don't think so, but I'm not so sure anymore. — J0hannes, Oct 06 '21 at 15:59
to squash commits : https://stackoverflow.com/questions/5189560/squash-my-last-x-commits-together-using-git — LeGEC, Oct 06 '21 at 16:03
So if I get this right: Whatever comes after 'git rebase' will be the destination, that (current) is rebased onto. I originally thought it would be the other way around. I followed the instructions from here: https://infrascloudy.github.io/2017/03/working-with-git.html — J0hannes, Oct 06 '21 at 20:12
The lowest headline shows the following instructions for "Rebasing branch off of upstream/...". git checkout your-feature-branch git fetch upstream git rebase -i upstream/devel git push -f # Because you rebased and rewrote history you have to force push git checkout devel This process makes it seem, as though the rebase command gets the upstream on top of the current(in my case called 8). Cause why would it be the other way around? I still don't get it. — J0hannes, Oct 06 '21 at 20:21

score 2 · Answer 1 · answered Oct 06 '21 at 22:58

I can't address anything about your IDE itself,¹ but your comment here is correct:

So if I get this right: Whatever comes after 'git rebase' will be the destination, that (current) is rebased onto.

It's also important to remember here that branches—or more precisely, branch names—matter very little here. The rebase operation will, at the end, move the current branch name, but everything rebase does in between is all about commits, not branches.

One needs to remember at all times that Git is really all about commits. The commits are each numbered, with a unique but random-looking, incomprehensible hexadecimal string, that humans tend to appreciate being abbreviated, e.g., 41b65f.² Each commit:

contains a full snapshot of all files (in a compressed, Git-only, de-duplicated format);
contains some metadata, such as the name and email address of whoever made the commit.

The metadata in each commit include the hash ID of a previous commit, or sometimes multiple previous commits. These are the parents of the commit. This forms a backwards-looking chain, linking commits together, but backwards. So Git actually works backwards.

A branch name simply selects some particular commit, which we've designated as important in some way. From here, Git will work backwards. That makes the selected commit the last commit on the branch. The commits themselves are independent of the branch name: Git needs only the raw hash ID of each commit, to look up the commit. The name stores—and thus also provides—the raw hash ID of this "last commit", so that mere humans don't have to memorize big ugly hash IDs.

Moreover, as we do work, we get "on" some branch, using git checkout or git switch. That attaches Git's special name HEAD to the branch name, so that Git knows which name we're using, and also extracts, from the commit, the read-only (and compressed and de-duplicated, readable only by Git itself) files that are stored in that commit, so that we can see them and work with them or update them as appropriate.

I like to draw this situation horizontally, with the latest commit—selected by the branch name—on the right:

... <-F <-G <-H   <-- branch1, branch2 (HEAD)

Here we're using commit H, which is the latest on both branches. We're using H via the name branch2: both names select this commit right now, but HEAD is attached to the name branch2. Commit H contains inside itself—in its metadata—the raw hash ID of its parent commit G, which contains the raw hash ID of still-earlier commit F, and so on.

When we make a new commit, it gets a new, unique, big ugly hash ID. Let's call this I instead of trying to guess what it would be. Git will:

write out a new snapshot of files (de-duplicating as much as possible, so only truly-new-versions of files need new snapshots);
write out metadata saying that we're the author of this new commit, and so on, and include in this metadata the real hash ID of commit H; and
last, store the new hash ID I—computed by writing out all of the above³—and stuffing that new hash ID into the current branch name: the one to which HEAD is attached.

So after making new commit I, our picture is now:

...--F--G--H   <-- branch1
            \
             I   <-- branch2 (HEAD)

The special name HEAD is still attached to the name branch2, but the name branch2 now finds commit I. Commits up through H are still on both branches. The branches won't diverge unless and until we switch back to branch1 and do something there, such as make a new commit or two:

             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   <-- branch-Y

Commits up through H are on both branches, while branches X and Y have diverged.

What's actually important here are the commits. The branch names just help us find the commits. Of course that's important too—the commits can't matter if we can't find them! But the real information is (in) the commits themselves.

¹I do not have or use that one. Indeed, I have a general allergic reaction to most IDEs and prefer command line operation, which I consider analogous to wanting to do woodworking in a shop full of tools, rather than with a Ronco WoodDoctor or whatever. I might, for instance, use this thing in a pinch, but I wouldn't use it for all-day-every-day work.

²This one is quite short: the standard shortest that Git produces is 7 characters long, and the shortest that Git will accept as input is 4. Git will find all internal objects that begin with whatever prefix you use, and if there is only one matching object, that's the one that gets selected. If more than one matches the prefix, you get an error about ambiguity, and modern Git shows you all the hash IDs that matched.

³This is where the real magic lies, in Git. The hash scheme used here is SHA-1, which is no longer secure but is still good enough for Git. Git is moving to SHA-256, though, and the transition is going to be messy.

Rebasing

This brings us to the idea of rebasing. Given:

             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   <-- branch-Y (HEAD)

we might decide that we'd like things better if we could change commits K and L somehow, so that they come after commit J.

Now, it's literally impossible to change anything about any commit (because of the hash ID tricks that Git uses). But what if we were to copy K and L to new and improved commits—let's call them K' and L'—that do in fact come after J?

That is, we'll end up with:

                  K'-L'
                 /
             I--J
            /
...--F--G--H
            \
             K--L

where I've taken all the names away. Let's put the names back, but cleverly switch them around a bit:

                  K'-L'  <-- branch-Y (HEAD)
                 /
             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   ???

Since we use the name branch-Y to find the two commits there, we'll find L' and then K' now, in Git's usual back-ass-ward fashion.

What happens to commits K-L? Nothing, that's what: they're still in there and if you somehow remember their raw hash IDs, you can still find them. Git will retain them for a while,⁴ and then if nobody can find them for long enough, they'll "fall away" and vanish for real.

The tool that makes these copies-of-commits is git rebase. Rebase itself is a pretty big and powerful tool, and inside it, it has a smaller tool that copies one commit at a time. This smaller tool is git cherry-pick.

Rebase starts by using Git's detached HEAD mode, which we haven't mentioned before. It's pretty simple though. In detached HEAD mode, the special name HEAD isn't attached to a branch name any more. Instead, HEAD points directly to a commit, rather like a branch name:

             I--J   <-- HEAD, branch-X
            /
...--F--G--H
            \
             K--L   <-- branch-Y

The rebase command starts out by listing out the hash IDs of the commits to copy (in the right order, backwards for Git, forwards for you: K, then L). Then it uses this detached-HEAD trick to make HEAD point directly to where you want the copies to go. Since you want the copies to go after commit J in this drawing, that's where our detached HEAD ends up.

Next, for each commit in the list of commit hash IDs, Git runs one git cherry-pick.⁵ Each cherry-pick operation is, technically, a merge operation inside Git, and can have merge conflicts, but if all goes smoothly, Git will do the "pick" step on its own and make one new commit. Since HEAD is detached, this step just writes the new commit's hash ID into the name HEAD directly:

                  K'  <-- HEAD
                 /
             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   <-- branch-Y

Commit K has now been copied, via cherry-pick or some other mechanism (see footnote 5), to the new-and-supposedly-improved K'. The difference between K and K' comes in two parts:

The stored snapshot in K' is presumably different. However, comparing H vs K, to see what changed, will produce the same diff output—more or less at least—as comparing snapshot J vs K'. The more or less part here sometimes does some heavy lifting: the diff might be quite different if there were a lot of merge conflicts you had to resolve.
And of course, the parent of K' is J, not H.

All of this is handled by the git cherry-pick tool, so that rebase only needs to submit the correct hash IDs to git cherry-pick, and have the correct commit checked out at that time. That's why rebase had to do a detached-HEAD checkout of commit J, though.

Now that K has been copied to K', rebase needs to issue another cherry-pick operation. This one will be asked to copy commit L. Git will compare K vs L to see what needs to be imported as changes to K''s snapshot. You can get merge conflicts here, as this is yet another merge. But if all goes well, Git will make a new snapshot L' on its own. L's parent will be K' since K' is the current, or HEAD, commit. The result will look like this:

                  K'-L'  <-- HEAD
                 /
             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   <-- branch-Y

That was the last copying step required, so git rebase is almost finished: it now only needs to yank the name branch-Y off commit L and make it point to commit L' instead, and then re-attach your HEAD to branch-Y:

                  K'-L'  <-- branch-Y (HEAD)
                 /
             I--J   <-- branch-X
            /
...--F--G--H
            \
             K--L   ???

The rebase process is now complete.

So, this is where each of the pieces come from:

Rebase needs to know which commits to copy. These come from the current branch, by starting at the last commit and working backwards as usual. (That list then has to have its order reversed.)
Rebase needs to know where to put the copies of commits. That comes from an argument you supply: the name branch-X, in this case, to tell it put the copies after commit J. You can use a raw commit hash ID here: the name doesn't matter, only the commit hash ID actually matters. But humans are bad at commit hash IDs.
Rebase also needs to know where to stop listing commits to copy. This also comes from an argument you supply: the name branch-X, in this case. By starting at commit J and working backwards, Git can tell which commits are already "on" branch-X. Those commits won't get copied. So Git won't copy J—though of course that wasn't on the to copy list either—nor I. But, importantly, Git won't copy H or any earlier commit, even though those commits are on branch-Y. They're on branch-X too, and that's what makes them not get copied.

Again, a raw hash ID would suffice for "what not to copy" as Git is only interested in the commit hash IDs. The only name Git needs is the name branch-Y: that's the name that has to move at the end of the operation. But Git can get that from your current branch: you're on branch-Y when you run git rebase. Rebase always affects the current branch.

⁴The mechanism that keeps them around for "a while" is that they're semi-secretly findable through reflog entries. The reflog entries eventually expire, and then they become truly un-find-able. A maintenance program—a sort of janitor that Git calls git gc—will discover the dead commits and get rid of them for real, at this point. (In fact, it's git gc that runs the expiry tool—git reflog expire—and the cleanup tools, git repack and friends, but the individual tools are there too, if you need them. Git is a whole machine-shop full of tools. Most of them are even pretty good tools, pretty solid and reliable, although there's the occasional Ronco device like git stash. Yeah, they're flaky and break a lot, but people like them for some reason.)

⁵Rebase is actually a very old tool that was modernized pretty recently. In older versions of Git—before Git 2.26—rebase uses something other than cherry-pick by default, and you have to add some options to make it use cherry-pick. Since cherry-pick is usually the right tool, upgrading Git is usually the right thing to do here. The older am based back end still usually works, and does run faster, so you can still use it; or if you're stuck with an older Git version, you can pass options to git rebase to get it to do cherry-picking.

Fancier rebase

Above, we ran git checkout branch-Y and then git rebase branch-X to do what we wanted, and it worked. The name branch-X was where the copies went. It also made sure we copied just the two "only on branch-Y" commits. So rebase cleverly used a single argument as both where to put the copies and what not to copy.

Eventually you'll find some situation where this is too clever and doesn't work for you. For instance, suppose you have this:

          I   <-- feature1
         /
...--G--H   <-- main
         \
          J--K   <-- feature2
              \
               L   <-- feature3

when you discover that feature3 is really part of feature1 after all. You now want to get commit L over onto I, as a copy L', but if you run:

git checkout feature3
git rebase feature1

Git will enumerate all the commits that are reachable from feature3, but not reachable from feature1. The first list goes L, K, J, H, G, ...; the second list goes I, H, G, .... Subtracting the second list from the first leaves L, K, and J—which rebase will then flip to the right order so rebase will copy J-K-L to J'-K'-L'.

But that's not what you want. You only want L copied to L': you want to leave J and K alone, on feature2.

The rebase command therefore allows you to separate out these two things: you can run git rebase --onto target upstream. In this case, the upstream argument is the one that limits what gets copied, and the target argument is where the copies go. Without --onto, you run git rebase upstream and upstream provides both pieces of information.

So, in this case, you would run:

git checkout feature3                 # what we want to copy
git rebase --onto feature1 feature2

The --onto argument feature1 says *put the copies after I; the upstream argument feature2 says don't copy commit K or anything earlier; and this copies just the commit(s) you want (if there are more commits than just the one L we show here, they all get copied).

The final result is what you want:

            L'  <-- feature3 (HEAD)
           /
          I   <-- feature1
         /
...--G--H   <-- main
         \
          J--K   <-- feature2
              \
               L   [abandoned]

Because Git is a big workshop full of tools, you can do this by checking out feature directly and running git cherry-pick yourself. Note that the result will be slightly different:

          I--L'  <-- feature1 (HEAD)
         /
...--G--H   <-- main
         \
          J--K   <-- feature2
              \
               L   <-- feature3

You can now delete the name feature3:

          I--L'  <-- feature1 (HEAD)
         /
...--G--H   <-- main
         \
          J--K   <-- feature2
              \
               L   [abandoned]

If you have just the one commit to copy and you want to have the name feature1 identify it, this cherry-pick operation is actually slightly better. But if you have a dozen commits to copy, you probably want to use rebase; it's the power-tool of commit-copying.

There's a great deal about rebase that I have not covered here: in particular, the part where rebase builds up the list of commits to copy is much more complicated than the simple "here minus upstream" thing. There's something called the fork point code, which is clever (it handles certain "upstream rebase" problems pretty nicely) but a bit fragile (it relies on reflogs and can break). There's some magic where Git uses git patch-id to try to figure out whether certain commits were already copied into the upstream as well. This code also works well, but can misfire on rare occasions. But rebase is a pretty good power tool. Just be sure to keep the finger-catchers working, so that you can go to the repository surgeon and have your fingers reattached if you saw them off. :-)

git rebase is not behaving as expected

1 Answers1

Rebasing

Fancier rebase