0

Like the title says, what is the difference between git rebase master and git rebase --onto master?

I ran both commands expecting to see the exact same results but got two wildly different commit histories afterwards.

Whats the big deal here? How are they different from each other?

AlanSTACK
  • 5,525
  • 3
  • 40
  • 99
  • 2
    Here is an example of `onto` doing something you could not do without it: https://stackoverflow.com/a/68522977/341994 – matt Aug 03 '21 at 05:19

2 Answers2

10

[Do not accept this answer. It's just a sort of commentary on torek's answer.]

The way to look at this, in my opinion, is to understand that the full form of git rebase is with onto and three parameters:

git rebase --onto x y z

To read that, in your mind clump y and z together, and swap --onto x to the end (because that's the natural English order for direct object and prepositional phrase), so that the whole thing parses something like this (pseudo-code):

rebase [y z] onto x

In that pseudo-code, the expression [y z] means "starting right after y and continuing all the way to z." Git calculates what "starting right after y" means by working backwards from z, not forward from y, but the effect is generally the same.

So git rebase --onto x y z means: "Grab all the commits starting right after y and continuing all the way to z, and append them to x."


Very well. That's the full form of git rebase. When you omit any of the parameters, Git fills them in for you. And the way it does that is surprising. That's the reason for the results you're seeing.

So let's take a real example. Here's our starting position:

* f8696e6 (HEAD -> dev) z
* 103333e (origin/dev) y
* 559ad1f x
| * 8032a5d (origin/main, main) c
| * 2caa1e9 b
|/  
* 06c7439 a

Look carefully at the graph. We are on dev. We have a remote origin, and we are one ahead of our remote-tracking branch origin/dev. dev split off from main at a, and after that it goes

x y z 

Meanwhile, main goes

a b c

Now let's try

git rebase main

We are on dev, so that means

git rebase main main dev

Which means "grab the commits start from after main and continuing to dev — namely, x y z — and attach them to main." Here we go... Here's what we get:

* f6b903e (HEAD -> dev) z
* 4adb109 y
* e9cc7fd x
* 8032a5d (origin/main, main) c
* 2caa1e9 b
* 06c7439 a

Yup, just as I said. This, I think, is what most people expect when they use a one-parameter git rebase.

Okay, now start over. This time we'll say

git rebase --onto main

That, as torek's answer tells you, means

git rebase --onto main origin/dev dev

So that means "grab everything from origin/dev to dev — that's just z — and attach it to main. That's extremely surprising! We never said anything about origin/dev, but that's where Git is going to snip our branch as we rebase. Here we go... Here's what we get:

* 0dccc25 (HEAD -> dev) z
* 8032a5d (origin/main, main) c
* 2caa1e9 b
| * 103333e (origin/dev) y
| * 559ad1f x
|/  
* 06c7439 a

That's probably the kind of thing happened to you (the OP). And it's easy to see why you found it surprising!

So in my opinion the main takeaway is that if you leave out any of the three parameters, you may be surprised by what Git chooses for them. Therefore, also in my opinion, you should not leave out any of them! You just don't know what will happen if you do.


Final note: Okay, I lied a little. Remember the first result?

* f6b903e (HEAD -> dev) z
* 4adb109 y
* e9cc7fd x
* 8032a5d (origin/main, main) c
* 2caa1e9 b
* 06c7439 a

I left out origin/dev from that diagram. In reality, this what we now have:

* f6b903e (HEAD -> dev) z
* 4adb109 y
* e9cc7fd x
* 8032a5d (origin/main, main) c
* 2caa1e9 b
| * 103333e (origin/dev) y
| * 559ad1f x
|/  
* 06c7439 a

Notice the duplication of the x and y commits. That's what git rebase does: it copies commits. This is a tricky situation if we intend to push dev, because we will be asking the remote origin to forget about the y and x currently pointed to by origin/dev, and it isn't going to be happy about that.

matt
  • 515,959
  • 87
  • 875
  • 1,141
  • Do note that when `z` is supplied, Git *begins* the process with a `git checkout z` or `git switch z` (depending on whether you like `checkout` or `switch` here), so that after it's all done, `git branch --show-current` shows that you're on `z`, even if you started on (say) branch `beagle`. – torek Feb 16 '22 at 07:17
4

Edit: I forgot a last—or first—point, which I'll insert first here. The usage for git rebase is, greatly simplified:

git rebase [ --onto <newbase> ] [ <upstream> ]

The square brackets [...] indicate that each argument is optional. The angle brackets <...> mean you fill in something here. The --onto newbase option uses a flag; the newbase is given (by you, the user) if and only if it's preceded by the keyword --onto, spelled with a double hyphen. Similarly, the upstream argument is give if and only if you give it. So:

git rebase master

gives one argument, an upstream, of master; git rebase --onto master gives one argument, a newbase, of master. If you don't give an upstream argument, git rebase finds one on its own. If you don't give a newbase argument, git rebase finds one on its own. If you give one, but not the other, git rebase still finds the other one on its own.

As a one-liner answer, then: git rebase master chooses master as both target and upstream, but git rebase --onto master chooses master as target, with the default upstream, whatever that is for the current branch. You can see the default for the current branch with:

git rev-parse --abbrev-ref @{upstream}

If the current branch is, say, dev, and its upstream is origin/dev, then git rebase master means git rebase --onto master master, but git rebase --onto master means git rebase --onto master origin/dev.

What the various arguments mean

To perform its duties—which is to say, copy some commits, then move one branch name—git rebase needs to know three things:

  • What commits should I copy?
  • Where should I put these copies?
  • What branch name should I move, in the end, after making copies?

The last of these defaults to the current branch.1 So you just run git checkout or git switch first, to select the correct branch.2

The Git authors cleverly crammed the remaining two of these three things into one argument to git rebase, which the documentation calls the upstream argument.

Sometimes, however, you really need these two things to be separate. The --onto flag allows you to separate them:

git rebase --onto <newbase> <upstream>

copies the commits to the supplied newbase, rather than copying them to the supplied upstream.

The curious thing is not so much the newbase, which is pretty straightforward, but rather how the upstream argument is used. The way it is used is complex, but to simplify it to essentials, Git runs:

git rev-list upstream..HEAD

(after doing the initial git checkout or git switch, if you provide a branch argument). So upstream specifies not what to copy, but rather what not to copy.

The rebase command is going to copy some set of commits. This is a given, because the goal of git rebase is to take some existing commits that are not quite good enough, in some way or form, and turn them into improved commits—but it is literally impossible to change any commit, once it is made. Since the existing commits can't be changed, the best that git rebase can do is to copy them to new-and-improved copies, and then start using the copies in place of the originals.

The "use the copies instead of the originals" step is what makes it necessary to move one branch name. Git finds commits using branch names. Branch names move all the time, usually in a simple, one-step-at-a-time, easily followed manner. But because names can move, git reset and other Git commands can move them, perhaps even violently, many commits at a time, abducting them from their home village in China and dropping them into the Australian Outback or whatever. In the case of git rebase, the rebase code first copies the selected, old-and-lousy commits to their new home, making changes that—we hope—improve them, or at least make them fit into their new home. Then it moves the branch name so that we find the copies instead of the originals—and then rebase is done.

The upstream argument specifies what not to copy, and sometimes—actually, remarkably often!—this same specifier can be used as the place that the to-be-copied commits should go. But when it can't be used that way, the --onto argument lets you specify both where to copy and what not to copy, as two separate things.

git | move old commit to the past of another branch shows a case where it's convenient to specify what not to copy and where to copy to as two separate things.


1In fact, git rebase can only move the current branch (as of the latest Git versions, 2.32-ish, but probably for quite a while yet to come too)—so if you supply a branch name, git rebase starts by using git checkout or git switch. See footnote 2.

2You can supply a branch name to the command line command. If you do, the command currently literally runs or has built into it the necessary checkout/switch operation. When the rebase is complete, you're on the branch you selected, even if you weren't before the rebase started. That is, in:

git checkout main          # puts us on `main`
git rebase origin/foo foo

we might as well have run:

git checkout foo
git rebase origin/foo

anyway, because we end up on foo, not main. But this does mean that if we would have to run git checkout foo, we can run:

git rebase origin/foo foo

or:

git rebase --onto target origin/foo foo

so as to get git rebase to do the git checkout for us.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 1
    But this is not what was asked. The question that needs answering, it seems to me, is the OP's claim that "I ran both commands [the specific commands `git rebase master` and `git rebase --onto master`] expecting to see the exact same results but got two wildly different commit histories afterwards." Either one must produce an example showing why that happened, or one must demonstrate that the OP is mistaken and it can't happen, they are identical if given under identical circumstances. – matt Aug 03 '21 at 09:59
  • 1
    @matt: that's what the bit at the front (the edit) is about. `rebase --onto master` chooses `master` as the *target*, with the *default upstream;* `rebase master` chooses `master` as both the target *and* the upstream. (Maybe I should add that sentence.) – torek Aug 03 '21 at 10:37
  • 2
    Really? So you're saying that "upstream" as a parameter name for rebase and "upstream" as the tracked remote-tracking branch are the same? Because I don't think so. This is the ambiguity discussed at https://stackoverflow.com/questions/2739376/definition-of-downstream-and-upstream and https://stackoverflow.com/questions/64262995/meaning-of-upstream-branch-in-rebase – matt Aug 03 '21 at 10:53
  • 1
    The `upstream` argument to `git rebase` is whatever you like, but *if you don't supply one*, Git uses `@{u}`. (Where rebase is strange, as it were, is that if you don't supply an `--onto`, the default `--onto` is the `upstream` you supplied or defaulted. If you defaulted it, this makes sense, but if you supplied an upstream argument, this can be a surprise.) – torek Aug 03 '21 at 10:54
  • 1
    That, if true, is the whole key to the mystery. That, and the missing third parameter. This is why I like `onto` with three parameters; it's the only one where I'm pretty sure I know what's going to happen. Thus for example where you say "If the current branch is, say, `dev`, ... then `git rebase master` means `git rebase --onto master master`" I would have said it means `git rebase --onto master master dev`. – matt Aug 03 '21 at 10:57
  • 1
    @matt: It's why I like rebase with explicit `--onto` and upstream. I dislike supplying the third (argument-to-checkout/switch) parameter because I think rebase should just *always* use the *current* branch, forcing the *user* to use `git checkout` or `git switch` first. (That's because `git merge` always uses the current branch, so that makes the commands slightly more symmetric.) – torek Aug 03 '21 at 11:00