9

According to my understanding of git pull --rebase origin master, it should be the equivalent of running the following commands:

(from branch master):  $ git fetch origin
(from branch master):  $ git rebase origin/master

I seem to have found some case where this doesn't work as expected. In my workspace, I have the following setup:

  • branch origin/master references branch master on remote origin
  • branch master is set up to track origin/master, and is behind master by several commits.
  • branch feature is set up to track local branch master, and ahead of master by several commits.

Sometimes, I will lose commits by running the following sequence of steps

(from branch master):  $ git pull --rebase
(from branch master):  $ git checkout feature
(from branch feature): $ git pull --rebase

At this point, the few commits ahead I was on feature have now been lost. Now, if I reset my position, and instead do the following:

(from branch feature): $ git reset --hard HEAD@{2} # rewind to before second git pull
(from branch feature): $ git rebase master

The commits have been applied correctly and my new commits on feature are still present. This seems to directly contradict my understanding of how git pull works, unless git fetch . does something stranger than I expected.

Unfortunately, this is not 100% reproducible for all commits. When it does work for a commit, though, it works every time.

Note: My git pull --rebase here should actually be read as a --rebase=preserve, if that matters. I have the following in my ~/.gitconfig:

[pull]
    rebase = preserve
ashays
  • 1,154
  • 1
  • 12
  • 30
  • You shouldn't be rebasing the remote tracking branch `origin/master` itself, rather you are trying to bring that forward, without affecting your (tracking) copy of that remote. I haven't check the manuals for the right invocations but perhaps checkout -b that branch under a new temporary name, and the rebase that temporary branch onto your current HEAD. – Philip Oakley Feb 10 '16 at 17:16
  • 1
    I think there's some confusion here. I'm not rebasing `origin/master`, but rebasing the current branch `master` onto `origin/master`. This should, according to my understanding, essentially bring `HEAD` to the tip of `origin/master`, and reapply commits that were at the tip of `master` back to the branch. Essentially, re-writing the new commits on `master` as if they happened after the changes from `origin/master`. – ashays Feb 10 '16 at 17:22
  • What is your (local) version of git? I ask because I recall (somewhat vaguely and have not yet gone back to check) that there was an automated fork point rebase bug in `git pull` for several 2.x releases and knowing the version might make checking for that a bit easier. – torek Feb 10 '16 at 17:45
  • My current version of git is 2.7.0, and I haven't updated it since finding this issue. – ashays Feb 10 '16 at 17:46
  • OK, all the known bugs were substantially earlier. I'll put in an answer regarding the differences I know of but I don't think it will help much here. – torek Feb 10 '16 at 18:16
  • @ashays, sorry for the confusion. The key point I wanted to make (apart from the mistake) was to `rebase --onto` the right place using the three refs version of rebase - it's well down the man page..... – Philip Oakley Feb 11 '16 at 23:44
  • Just to say, the more direct question behind this is: "what is the difference between `git rebase` and `git rebase master`?" – Luke Usherwood Nov 30 '16 at 11:07

1 Answers1

12

(Edit, 30 Nov 2016: see also this answer to Why is git rebase discarding my commits?. It is now virtually certain that it is due to the fork-point option.)

There are a few differences between manual and pull-based git rebase (fewer now in 2.7 than there were in versions of git predating the --fork-point option in git merge-base). And, I suspect your automatic preserve-merges may be involved. It's a bit hard to be sure but the fact that your local branch follows your other local branch which is getting rebased is quite suggestive. Meanwhile, the old git pull script was also rewritten in C recently so it's harder to see what it does (though you can set environment variable GIT_TRACE to 1 to make git show you commands as it runs them internally).

In any case, there are two or three key items here (depending on how you count and split these up, I'll make it into 3):

  • git pull runs git fetch, then either git merge or git rebase per instructions, but when it runs git rebase it uses the new fork-point machinery to "recover from an upstream rebase".

  • When git rebase is run with no arguments it has a special case that invokes the fork-point machinery. When run with arguments, the fork-point machinery is disabled unless explicitly requested with --fork-point.

  • When git rebase is instructed to preserve merges, it uses the interactive rebase code (non-interactively). I'm not sure this actually matters here (hence "may be involved" above). Normally it flattens away merges and only the interactive rebase script has code to preserve them (this code actually re-does the merges since there's no other way to deal with them).

The most important item here (for sure) is the fork point code. This code uses the reflog to handle cases best shown by drawing part of the commit graph.

In a normal (no fork point stuff needed) rebase case you have something like this:

... - A - B - C - D - E   <-- origin/foo
            \
              I - J - K   <-- foo

where A and B are commits you had when you started your branch (so that B is the merge-base), C through E are new commits you picked up from the remote via git fetch, and I through K are your own commits. The rebase code copies I through K, attaching the first copy to E, the second to the-copy-of-I, and the third to the-copy-of-J.

Git figures out—or used to, anyway—which commits to copy using git rev-list origin/foo..foo, i.e., using the name of your current branch (foo) to find K and work backwards, and the name of its upstream (origin/foo) to find E and work backwards. The backwards march stops at the merge base, in this case B, and the copied result looks like this:

... - A - B - C - D - E   <-- origin/foo
           \            \
            \             I' - J' - K'   <-- foo
             \
              I - J - K   [foo@{1}: reflog for foo]

The problem with this method occurs when the upstream—origin/foo here—is itself rebased. Let's say, for instance, that on origin someone force-pushed so that B was replaced by a new copy B' with different commit wording (and maybe a different tree as well, but, we hope, nothing that affects our I-through-K). The starting point now looks like this:

          B' - C - D - E    <-- origin/foo
        /
... - A - B   <-- [origin/foo@{n}]
            \
              I - J - K   <-- foo

Using git rev-list origin/foo..foo, we'd select commits B, I, J, and K to be copied, and try to paste them on after E as usual; but we don't want to copy B as it really came from origin and has been replaced with its own copy B'.

What the fork point code does is look at the reflog for origin to see if B was reachable at some time. That is, it checks not just origin/master (finding E and scanning back to B' and then A), but also origin/master@{1} (pointing directly to B, probably, depending on how frequently you run git fetch), origin/master@{2}, and so on. Any commits on foo that are reachable from any origin/master@{n} are included for consideration in finding a Lowest Common Ancestor node in the graph (i.e., they're all treated as options to become the merge base that git merge-base prints out).

(It's worth noting a defect of sorts here: this automated fork point detection can only find commits that were reachable for the time that the reflog entry is maintained, which in this case defaults to 30 days. However, that's not particularly relevant to your issue.)


In your case, you have three branch names (and hence three reflogs) involved:

  • origin/master, which is updated by git fetch (the first step of your git pull while branch master)
  • master, which is updated by both you (via normal commits) and git rebase (the second step of your git pull), and
  • feature, which is updated by both you (via normal commits) and git rebase (the second step of your second git pull: you "fetch" from yourself, a no-op, then rebase feature on master).

Both rebases are run with --preserve-merges (hence non-interacting interactive mode) and --onto new-tip fork-point, where the fork-point commit ID is found by running git merge-base --fork-point upstream-name HEAD. The upstream-name for the first rebase is origin/master (well, refs/remotes/origin/master) and the upstream-name for the second rebase is master (refs/heads/master).

This should all Just Work. If your commit graph at the start of the whole process is something like what you've described:

... - A - B   <-- master, origin/master
            \
              I - J - K   <-- feature

then the first fetch brings in some commits and makes origin/master point to the new tip:

              C - D - E   <-- origin/master
            /
... - A - B   <-- master, origin/master@{1}
            \
              I - J - K   <-- feature

and the first rebase then finds nothing to copy (the merge-base of master and BB=fork-point(master, origin/master)—is just B so there is nothing to copy), giving:

              C - D - E   <-- master, origin/master
            /
... - A - B   <-- master@{1}, origin/master@{1}
            \
              I - J - K   <-- feature

The second fetch is from yourself and a no-op/skipped entirely, leaving this as the input to the second rebase. The --onto target is master which is commit E and the fork-point of HEAD (feature) and master is also commit B, leaving commits I through K to copy after E as usual.

If some commit(s) are being dropped, something is going wrong in this process, but I can't see what.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • 2
    Wow. That's an awesome answer and provides a lot of insight into what's going on here. I'm going to spend some time trying to recreate the circumstances that led to this issue, and see if this answer sheds some light on what's going on. I suspect this will lead me directly to it, but I'd love to find the exact reason. I'll let you know if I find anything. – ashays Feb 10 '16 at 21:09
  • After spending a bit more time digging into this, I wasn't able to always reconstruct a use case (but I found it naturally again). This definitely seems to be the problem! Thank you for spending the time on this excellent answer. It definitely added to my understanding of git. – ashays Feb 17 '16 at 17:34
  • 1
    @ashays What was the underlying issue causing you to lose commits? – michael.schuett Feb 19 '16 at 21:41
  • @mschuett (and ashays): I've posted some concrete steps to reproduce the issue here: http://stackoverflow.com/a/40886668/932359 - which might help clarify how commits can be lost. – Luke Usherwood Nov 30 '16 at 11:10
  • @LukeUsherwood: nice. I put in a link to your answer on the other question. – torek Nov 30 '16 at 14:05
  • Wow, thanks guys! This really helps explain what was going on here. And thanks @LukeUsherwood for adding reproducing steps. – ashays Nov 30 '16 at 18:53