The "ahead" and "behind" counts are from the result of:
git rev-list --count <exclude>..<include>
(note: two dots) which is run twice, with <exclude>
and <include>
swapped; or equivalently, from the result of:
git rev-list --count --left-right HEAD...<upstream>
(note: three dots this time, and I changed the names a bit). This does not really change the question, but—I hope—reframes it slightly to make the answer make more sense. There are several parts to the answer as well, but in the end, it's because merges tend to add a lot to reachability.
What are these exclude/include things with the two dot syntax?
The problem with a Git branch name is that it only names one commit. We want "a branch" to mean many commits. Well, that is, sometimes we want that, and sometimes we only want one commit. The one commit is the tip of the branch, and the many-commits are some subset of all commits reachable from the tip commit. (See also What exactly do we mean by "branch"?)
This "reachability" concept is at the core of a lot of Git operations. The way it works is very simple, with one wrinkle at merge commits: each commit records its parent commit, which is "the commit that was the branch tip, just before we made this commit."
That is, if we start with a simple repository with just three commits, it might look like this:
A <-B <-C <--master
(I've replaced Git's big ugly hash IDs—the ones Git shows you now and then, and often abbreviates—with single letters, since 3bc53220cb2dcf709f7a027a3f526befd021d858
is basically incomprehensible. This means I can only draw 26 commits, but that should be plenty for illustration.) The branch name master
has C
's ID in it. We say that the name master
points to commit C
. Commit C
, though, actually has B
's ID right inside C
, and we say that C
points to B
. Likewise, B
points to A
, but A
is the very first commit we ever made, so it cannot and does not point anywhere at all. This means that A
is a root commit.
To add a new commit D
, Git writes out the commit itself with the ID of commit C
inside it, so that D
points back to C
. Then it changes the name master
to have D
's new ID, so that master
points to D
:
A--B--C--D <-- master
Note that all of Git's internal arrows are backwards! We start with the most recent commit, and work back through history. This allows Git to never touch any existing commit. Mostly, we don't have to care, though, so instead of drawing all the internal arrows, we can just draw lines connecting each commit to its parent(s), and remember that Git works backwards.
This parent linkage gives you a nice linear structure: commit (biguglyhash1) has commit (biguglyhash2) as its parent, and that commit has (biguglyhash3) as its parent, and so on. A branch in this linkage-of-commits occurs when two different commits have the same parent:
...--C--D
\
E
Here, commit D
points back to C
, but so does commit E
. (Note that when we made commit C
we probably had no idea it would have two children. Since the internal arrows all point backwards, though, we don't have to change C
to record its children; it's sufficient to have both D
and E
remember C
as their parent. This holds no matter how many children we may eventually add.)
Anyway, we work for a while and get some more commits, on both branches—let's put in a branch name while we are at it:
A--B--C--D-------I <-- master
\
E--F--G--H <-- sidebr
It's obvious that commits A-B-C
are all on branch master
... but that's only obvious because that's how we started out and that's how we drew this. What if we re-draw the graph a bit, like this?
C--D-------I <-- master
/ \
A--B E--F--G--H <-- sidebr
Now it seems like A
and B
are on sidebr
. Or, we could use this drawing, which might be the most accurate of all:
D--------I <-- master
/
A--B--C
\
E--F--G--H <-- sidebr
Now it seems like maybe A-B-C
aren't on either branch, or—maybe—are on both branches. And that's the right answer for Git: A-B-C
are in fact on both branches.
More precisely, if we start at the tip commit of either branch, I
for master and H
for sidebr
, we work our way backwards (leftwards, in these graphs; with git log --graph
and other drawings it's often downwards or upwards depending on who and what drew it) from each commit to its parent. As we go, each commit we find is "on"—or "contained in"—the branch. Since C
is reachable from either tip, it's on both branches.
But a lot of the time we don't want all the commits
Ultimately, then, if you name a branch tip, that can select every commit all the way back to the root commit. The name sidebr
could mean commits all the way back to A
. But most of the time, we don't want all those commits. Given:
D--------I <-- master
/
A--B--C
\
E--F--G--H <-- sidebr
we would like to say: "Look at commits that are reachable from sidebr
, but stop looking at commits that are reachable from master
." This is why Git has the two-dot syntax:
master..sidebr
literally means "commits reachable from sidebr
, excluding commits reachable from master
". In other words, this names commits E-F-G-H
, and not any of A-B-C
. (It also excludes D
and I
, which is really easy since they were never included in the first place.)
There are a bunch of ways to visualize this, but my favorite is to color in the commits, as if with a highlighter. We mark the tip of the "exclude" branch in red, and the tip of the "include" branch in green. We extend the red marker back to all its appropriate parents, and the green marker back to all its appropriate parents. Just make sure that either red overwrites green, or that you never color in green once you've done all the red ones first. When you are done, all the green commits are the ones selected.
Now, this is all fine and should be reasonably obvious, but let's make the graph a bit more complicated. Let's add origin/master
, and say that it points to commit D
, which forces me to re-draw the graph again:
D <-- origin/master
/ \
A--B--C I <-- master
\
E--F--G--H <-- sidebr
That is, we have one commit on our master
that we added, and that we have not yet pushed up to origin
; and four commits on our sidebr
. Let's add a merge commit now, by merging sidebr
into master
with git checkout master; git merge sidebr
:
D <-- origin/master
/ \
A--B--C I--------J <-- master
\ /
E--F--G--H <-- sidebr
This creates a new merge commit J
. The thing that makes it a merge commit—aside from using git merge
to make it, that is—is that it has two parents instead of just one. The first parent of J
is I
, in the usual way, but the second parent of J
is H
: the tip of sidebr
.
This new merge commit J
makes the name master
"reach" all of the sidebr
commits, through that second parent. When we go to color in reachable and exclude commits, we must follow all the parents of a merge, simultaneously (or as close as we can to that), until the branches re-join wherever they forked earlier.
(As an aside, the merge base of I
and H
is commit C
. Commit C
is where the branches first re-join. That's really easy to see in the diagrams from before we added origin/master
. Adding a label to the graph never alters the graph itself, and the merge base of any two commits is determined by the graph. When you are merging, the merge base is critical to how the merge will be done, so it's important to draw the graph by starting from the two tip commits and working back to where they first come together. That's what Git will do: find the merge base, then diff the merge base against each tip commit.)
So, consider what origin/master..master
meant both before and after the merge. Before the merge, we would paint some commits red, based on what we can reach from origin/master
. Those are commits A-B-C-D
. We'd paint other commits green, based on what we can reach from master
: that's A-B-C-D-I
. The git rev-list
command normally prints out the ID of each found commit, but with --count
, it just counts them up. So:
git rev-list origin/master..master
will count just one "green commit" I
and print 1
.
After the merge, though, the name master
names commits A-B-C-D-E-F-G-H-I-J
. We only added one commit, J
, but J
brought in commits E-F-G-H
. We color them red and green as before—try it out on a whiteboard, or on paper, or at least in your head—and count the commits, and instead of two, we get six. So:
git rev-list origin/master..master
will now print 6
.
How you can be both ahead and behind
We just did this with a case where master
winds up only ahead of origin/master
, but let's "erase the merge" for a moment, and look at master
vs sidebr
, without bothering with origin/master
:
D--------I <-- master
/
A--B--C
\
E--F--G--H <-- sidebr
If we count commits on master
, excluding commits on sidebr
, with sidebr..master
, we will count commits D
and I
, which gets us the number 2. If we count commits on sidebr
while excluding master
, i.e., master..sidebr
, we count E-F-G-H
and get 4. That's why we want to do both, to see if there are commits on one branch that aren't on the other, in both directions.
Git can do this for us all at once using the three-dot syntax, master...sidebr
or sidebr...master
. This means "commits that are reachable from either tip, but not reachable from both." That is, we want commits D
and I
to be colored green, and E-F-G-H
to be colored green, but we want want A-B-C
, which are on both branches, to be red / stopped-out. The three-dot syntax does that; if we just count the resulting set, it has six commits (D-I
plus E-F-G-H
). Adding --left-right
tells Git to remember for us which side they came in from, and adding --count
counts the two sides, so this prints both 2 and 4. The number on the left is the number of commits on the left branch, and the number on the right is the number of commits on the right branch. (Note that swapping the names swaps the counts.)
We can do the same thing with master
and origin/master
:
git rev-list --left-right --count origin/master..master
The left hand count will be commits that are on origin/master
that are not on master
, and the right hand count will be commits on master
that are not on origin/master
. Or, swap the names around and we get swapped counts.
What it means to be ahead and/or behind
Let's return to the "after the merge" graph:
D <-- origin/master
/ \
A--B--C I--------J <-- master
\ /
E--F--G--H <-- sidebr
If we count commits on origin/master
and master
, we already noted that master
was 6 ahead
of origin/master
. That happened because master
was 1 ahead
, then we added merge commit J
which brought in four more commits = 6 ahead
.
If we now run git fetch origin
to pick up any new commits they have, we might find that origin
got a new commit on its master. Let's call that commit K
, and draw it in:
D----------K <-- origin/master
/ \
A--B--C I--------J <-- master
\ /
E--F--G--H <-- sidebr
Now let's count both sides of master...origin/master
. We get the same six on the left side—six commits reachable from master
that are not reachable from origin/master
—but this time, we get one commit, K
, on the right side. So our master
is now 6 ahead, 1 behind
of our origin/master
, which is our Git's new memory of where master
is on origin
.
The <upstream>
in HEAD...<upstream>
For Git to count these commits for you, Git needs to know what names to use in your repository when doing these git rev-list --count
operations.
The current branch is easy, because the file HEAD
records the name of the current branch. (Try it yourself:
$ cat .git/HEAD
ref: refs/heads/master
This, in fact, is exactly what it means to be on branch master
, as git status
will say.) But what should the other half of master...something
or something...master
be?
For this, Git uses the upstream of the branch. This upstream is created automatically when you do:
git checkout somebranch
and there's an origin/somebranch
in your repository. In this case, Git creates a new local somebranch
pointing to the same commit as your origin/somebranch
, which is a remote-tracking branch name. A remote-tracking branch is one whose name is like origin/master
, i.e., the name of a remote origin
, plus a slash, plus a branch name retrieved from the remote. Your Git uses these names to remember what you last got from, or gave to, that remote. Running git fetch origin
updates all your origin/*
names (and is otherwise totally safe—you can git fetch
at any time without disrupting things).
In some cases, like when you create your own branch sidebr
that doesn't have an origin/sidebr
yet, you will want to use:
git push -u origin somebr
or:
git push --set-upstream origin somebr
This tells your Git that once origin
's Git has its own sidebr
so that your Git remembers that as origin/sidebr
, then your Git should set origin/sidebr
as the upstream for your sidebr
.
At any time, you can run:
git branch --set-upstream-to origin/whatever
to change your current branch's upstream setting to origin/whatever
. But there is only one upstream per branch, so you probably want to leave it set to origin/same-name
.
To see your current branch's upstream:
git rev-parse --abbrev-ref @{u}
or:
git rev-parse --symbolic-full-name @{u}
(you may need to quote the @{u}
part to protect it from your shell: sh and bash are OK with this, but csh/tcsh try to do brace expansion, and zsh might as well). Try them out to see the difference between --abbrev-ref
and --symbolic-full-name
. Note that the current branch must have an upstream for @{u}
to work. (While there's only one upstream, you can have no upstream instead.)