Why does git think I'm 11 commits behind, when I only see two commits?

Question

I have a branch tracking a remote branch. Remote branch is at commit a5d6f33. Local branch is at 4a656e7. If I check out a new branch tracking the remote, putting it at a5d6f33, I see the following in the log:

* a5d6f33
|
* d335c4c
|
* 4a656e7
|
* 9c744ca
|
* 49e35d0

If I view it in SourceTree, I get the following:

* a5d6f33 (4a656e7, d335c4c)
|\
| * d335c4c [feature/b]
| |
* 4a656e7 (49e35d0, 9c744ca)
|\|
* 49e35d0
| |\
:/  * 9c744ca [feature/a]
* f21f262 
|   :
* 125ba89
|   * edaf06e
| /
* f5df392

Basically, feature/a was branched from f5df392, and merged in at 4a656e7. feature/b was branched from f21f262, and merged in at a5d6f33. There were other commits prior to those (represented by colons), but none appear relevant to this (feature/b only has one commit, all the other commits come before the sequence in question; the two feature branches were started with a single commit in-between their root branch point, and merged one after the other).

Both branches merged clean. If 49e35d0 already has all of the changes in 125ba89 and f21f262 and everything else leading up to 4a656e7, to me, there should be two new commits (d335c4c and a5d6f33) missing from my local branch tracking the remote.

Where are the other 9 coming from? Is there a way to find where git is getting this 11 count from? If I do a diff between local and remote, I only see the modifications in the single feature/b commit.

It's kinda hard to tell from that graph in which side is each commit. Maybe if you move those to the left instead of keeping everything in the right would clarify a little. Also, maybe use capital letters like `A`, `B`, `C`... instead of hash values. — Samir Aguiar, Mar 07 '17 at 17:15
any chance you could provide a `git log --decorate --oneline --graph --branches --all` ? this will help 'clean up' the graph to make it easier to read — g19fanatic, Mar 07 '17 at 18:48

score 2 · Answer 1 · edited May 23 '17 at 12:09

The "ahead" and "behind" counts are from the result of:

git rev-list --count <exclude>..<include>

(note: two dots) which is run twice, with <exclude> and <include> swapped; or equivalently, from the result of:

git rev-list --count --left-right HEAD...<upstream>

(note: three dots this time, and I changed the names a bit). This does not really change the question, but—I hope—reframes it slightly to make the answer make more sense. There are several parts to the answer as well, but in the end, it's because merges tend to add a lot to reachability.

What are these exclude/include things with the two dot syntax?

The problem with a Git branch name is that it only names one commit. We want "a branch" to mean many commits. Well, that is, sometimes we want that, and sometimes we only want one commit. The one commit is the tip of the branch, and the many-commits are some subset of all commits reachable from the tip commit. (See also What exactly do we mean by "branch"?)

This "reachability" concept is at the core of a lot of Git operations. The way it works is very simple, with one wrinkle at merge commits: each commit records its parent commit, which is "the commit that was the branch tip, just before we made this commit."

That is, if we start with a simple repository with just three commits, it might look like this:

A  <-B  <-C   <--master

(I've replaced Git's big ugly hash IDs—the ones Git shows you now and then, and often abbreviates—with single letters, since 3bc53220cb2dcf709f7a027a3f526befd021d858 is basically incomprehensible. This means I can only draw 26 commits, but that should be plenty for illustration.) The branch name master has C's ID in it. We say that the name master points to commit C. Commit C, though, actually has B's ID right inside C, and we say that C points to B. Likewise, B points to A, but A is the very first commit we ever made, so it cannot and does not point anywhere at all. This means that A is a root commit.

To add a new commit D, Git writes out the commit itself with the ID of commit C inside it, so that D points back to C. Then it changes the name master to have D's new ID, so that master points to D:

A--B--C--D   <-- master

Note that all of Git's internal arrows are backwards! We start with the most recent commit, and work back through history. This allows Git to never touch any existing commit. Mostly, we don't have to care, though, so instead of drawing all the internal arrows, we can just draw lines connecting each commit to its parent(s), and remember that Git works backwards.

This parent linkage gives you a nice linear structure: commit (biguglyhash1) has commit (biguglyhash2) as its parent, and that commit has (biguglyhash3) as its parent, and so on. A branch in this linkage-of-commits occurs when two different commits have the same parent:

...--C--D
      \
       E

Here, commit D points back to C, but so does commit E. (Note that when we made commit C we probably had no idea it would have two children. Since the internal arrows all point backwards, though, we don't have to change C to record its children; it's sufficient to have both D and E remember C as their parent. This holds no matter how many children we may eventually add.)

Anyway, we work for a while and get some more commits, on both branches—let's put in a branch name while we are at it:

A--B--C--D-------I   <-- master
       \
        E--F--G--H   <-- sidebr

It's obvious that commits A-B-C are all on branch master ... but that's only obvious because that's how we started out and that's how we drew this. What if we re-draw the graph a bit, like this?

     C--D-------I   <-- master
    / \
A--B   E--F--G--H   <-- sidebr

Now it seems like A and B are on sidebr. Or, we could use this drawing, which might be the most accurate of all:

        D--------I   <-- master
       /
A--B--C
       \
        E--F--G--H   <-- sidebr

Now it seems like maybe A-B-C aren't on either branch, or—maybe—are on both branches. And that's the right answer for Git: A-B-C are in fact on both branches.

More precisely, if we start at the tip commit of either branch, I for master and H for sidebr, we work our way backwards (leftwards, in these graphs; with git log --graph and other drawings it's often downwards or upwards depending on who and what drew it) from each commit to its parent. As we go, each commit we find is "on"—or "contained in"—the branch. Since C is reachable from either tip, it's on both branches.

But a lot of the time we don't want all the commits

Ultimately, then, if you name a branch tip, that can select every commit all the way back to the root commit. The name sidebr could mean commits all the way back to A. But most of the time, we don't want all those commits. Given:

        D--------I   <-- master
       /
A--B--C
       \
        E--F--G--H   <-- sidebr

we would like to say: "Look at commits that are reachable from sidebr, but stop looking at commits that are reachable from master." This is why Git has the two-dot syntax:

master..sidebr

literally means "commits reachable from sidebr, excluding commits reachable from master". In other words, this names commits E-F-G-H, and not any of A-B-C. (It also excludes D and I, which is really easy since they were never included in the first place.)

There are a bunch of ways to visualize this, but my favorite is to color in the commits, as if with a highlighter. We mark the tip of the "exclude" branch in red, and the tip of the "include" branch in green. We extend the red marker back to all its appropriate parents, and the green marker back to all its appropriate parents. Just make sure that either red overwrites green, or that you never color in green once you've done all the red ones first. When you are done, all the green commits are the ones selected.

Now, this is all fine and should be reasonably obvious, but let's make the graph a bit more complicated. Let's add origin/master, and say that it points to commit D, which forces me to re-draw the graph again:

        D            <-- origin/master
       / \
A--B--C   I          <-- master
       \
        E--F--G--H   <-- sidebr

That is, we have one commit on our master that we added, and that we have not yet pushed up to origin; and four commits on our sidebr. Let's add a merge commit now, by merging sidebr into master with git checkout master; git merge sidebr:

        D              <-- origin/master
       / \
A--B--C   I--------J   <-- master
       \          /
        E--F--G--H     <-- sidebr

This creates a new merge commit J. The thing that makes it a merge commit—aside from using git merge to make it, that is—is that it has two parents instead of just one. The first parent of J is I, in the usual way, but the second parent of J is H: the tip of sidebr.

This new merge commit J makes the name master "reach" all of the sidebr commits, through that second parent. When we go to color in reachable and exclude commits, we must follow all the parents of a merge, simultaneously (or as close as we can to that), until the branches re-join wherever they forked earlier.

(As an aside, the merge base of I and H is commit C. Commit C is where the branches first re-join. That's really easy to see in the diagrams from before we added origin/master. Adding a label to the graph never alters the graph itself, and the merge base of any two commits is determined by the graph. When you are merging, the merge base is critical to how the merge will be done, so it's important to draw the graph by starting from the two tip commits and working back to where they first come together. That's what Git will do: find the merge base, then diff the merge base against each tip commit.)

So, consider what origin/master..master meant both before and after the merge. Before the merge, we would paint some commits red, based on what we can reach from origin/master. Those are commits A-B-C-D. We'd paint other commits green, based on what we can reach from master: that's A-B-C-D-I. The git rev-list command normally prints out the ID of each found commit, but with --count, it just counts them up. So:

git rev-list origin/master..master

will count just one "green commit" I and print 1.

After the merge, though, the name master names commits A-B-C-D-E-F-G-H-I-J. We only added one commit, J, but J brought in commits E-F-G-H. We color them red and green as before—try it out on a whiteboard, or on paper, or at least in your head—and count the commits, and instead of two, we get six. So:

git rev-list origin/master..master

will now print 6.

How you can be both ahead and behind

We just did this with a case where master winds up only ahead of origin/master, but let's "erase the merge" for a moment, and look at master vs sidebr, without bothering with origin/master:

        D--------I   <-- master
       /
A--B--C
       \
        E--F--G--H   <-- sidebr

If we count commits on master, excluding commits on sidebr, with sidebr..master, we will count commits D and I, which gets us the number 2. If we count commits on sidebr while excluding master, i.e., master..sidebr, we count E-F-G-H and get 4. That's why we want to do both, to see if there are commits on one branch that aren't on the other, in both directions.

Git can do this for us all at once using the three-dot syntax, master...sidebr or sidebr...master. This means "commits that are reachable from either tip, but not reachable from both." That is, we want commits D and I to be colored green, and E-F-G-H to be colored green, but we want want A-B-C, which are on both branches, to be red / stopped-out. The three-dot syntax does that; if we just count the resulting set, it has six commits (D-I plus E-F-G-H). Adding --left-right tells Git to remember for us which side they came in from, and adding --count counts the two sides, so this prints both 2 and 4. The number on the left is the number of commits on the left branch, and the number on the right is the number of commits on the right branch. (Note that swapping the names swaps the counts.)

We can do the same thing with master and origin/master:

git rev-list --left-right --count origin/master..master

The left hand count will be commits that are on origin/master that are not on master, and the right hand count will be commits on master that are not on origin/master. Or, swap the names around and we get swapped counts.

What it means to be ahead and/or behind

Let's return to the "after the merge" graph:

        D              <-- origin/master
       / \
A--B--C   I--------J   <-- master
       \          /
        E--F--G--H     <-- sidebr

If we count commits on origin/master and master, we already noted that master was 6 ahead of origin/master. That happened because master was 1 ahead, then we added merge commit J which brought in four more commits = 6 ahead.

If we now run git fetch origin to pick up any new commits they have, we might find that origin got a new commit on its master. Let's call that commit K, and draw it in:

        D----------K   <-- origin/master
       / \
A--B--C   I--------J   <-- master
       \          /
        E--F--G--H     <-- sidebr

Now let's count both sides of master...origin/master. We get the same six on the left side—six commits reachable from master that are not reachable from origin/master—but this time, we get one commit, K, on the right side. So our master is now 6 ahead, 1 behind of our origin/master, which is our Git's new memory of where master is on origin.

The `<upstream>` in `HEAD...<upstream>`

For Git to count these commits for you, Git needs to know what names to use in your repository when doing these git rev-list --count operations.

The current branch is easy, because the file HEAD records the name of the current branch. (Try it yourself:

$ cat .git/HEAD
ref: refs/heads/master

This, in fact, is exactly what it means to be on branch master, as git status will say.) But what should the other half of master...something or something...master be?

For this, Git uses the upstream of the branch. This upstream is created automatically when you do:

git checkout somebranch

and there's an origin/somebranch in your repository. In this case, Git creates a new local somebranch pointing to the same commit as your origin/somebranch, which is a remote-tracking branch name. A remote-tracking branch is one whose name is like origin/master, i.e., the name of a remote origin, plus a slash, plus a branch name retrieved from the remote. Your Git uses these names to remember what you last got from, or gave to, that remote. Running git fetch origin updates all your origin/* names (and is otherwise totally safe—you can git fetch at any time without disrupting things).

In some cases, like when you create your own branch sidebr that doesn't have an origin/sidebr yet, you will want to use:

git push -u origin somebr

or:

git push --set-upstream origin somebr

This tells your Git that once origin's Git has its own sidebr so that your Git remembers that as origin/sidebr, then your Git should set origin/sidebr as the upstream for your sidebr.

At any time, you can run:

git branch --set-upstream-to origin/whatever

to change your current branch's upstream setting to origin/whatever. But there is only one upstream per branch, so you probably want to leave it set to origin/same-name.

To see your current branch's upstream:

git rev-parse --abbrev-ref @{u}

or:

git rev-parse --symbolic-full-name @{u}

(you may need to quote the @{u} part to protect it from your shell: sh and bash are OK with this, but csh/tcsh try to do brace expansion, and zsh might as well). Try them out to see the difference between --abbrev-ref and --symbolic-full-name. Note that the current branch must have an upstream for @{u} to work. (While there's only one upstream, you can have no upstream instead.)

score 0 · Answer 2 · answered Mar 08 '17 at 05:36

0

To check the commits on current local branch which behind origin, you can use:

git log branchname..origin/branchname --oneline

If git status says there have 11 commits behind, you will find the 11 commits here.

answered Mar 08 '17 at 05:36

Marina Liu

36,876
5
61
74