0

I am new to Git and I have tried playing around with a few features.

What does

git diff HEAD...origin master

vs.

git diff origin master

do?

They seem to give me entirely different results.

Perhaps it's good to note that I do have a origin/master that is different from origin master.

Shouldn't it all mean the same thing?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
aceminer
  • 4,089
  • 9
  • 56
  • 104

2 Answers2

4

You're mixing together several different Git concepts. Admittedly, these Git concepts all have very similar names: remotes, branches, and remote-tracking branches. (Git's nomenclature gets even worse, as there's the concept of tracking, which is not the same as a remote-tracking branch, and when you have one branch tracking another, the tracked branch is called the upstream, which is not the same as using a remote as an upstream, or using a remote named upstream. If you're not confused, you're doing way better than I was, or most people. :-) )

So, let's back this thing up a bit and define all these.

Definition: branch

First, we have branches. The word "branch" in Git is actually ambiguous: it can refer to a branch name, like master, or it can refer to a series of commits, starting from the tip-most commit on a branch, and working backwards through time. In other words, if you casually say "blah mumble branch master yada yada", it's not immediately clear whether you mean the branch name master, or the series of commits formed by starting with the commit that master names and working backwards through history.

It's usually clear from context though, and if not, you can use "branch name" and "branch structure" to distinguish them. A branch name is just a word like master, except that to be a current and valid branch name, it has to be a name Git knows about, that Git will show if you run git branch. See also What exactly do we mean by "branch"?

Note that a branch name can translate directly into the tip-most commit of the branch structure. The git diff command uses this quite a bit, as we'll see in a moment. To see how a branch name turns into a commit ID, use git rev-parse. This command simply looks at things, so it's safe to use any time. Try it out now:

$ git rev-parse master

and:

$ git rev-parse HEAD

and if you have some other branches, try passing their names to git rev-parse. (Then try git branch -vv and compare the abbreviated commit IDs you see in its output, to what you got fromgit rev-parse`.)

Definition: remote

A remote is just a name, like origin. In this respect it is much like a branch name. The difference is that a remote name is stored in a different place from branch names, and Git will show you your remote names if you run git remote. Besides this, a remote name gives you two things: the ability to run git fetch and git push without writing out a big long URL—Git keeps the big long URL under the name of the remote—and the ability to have remote-tracking branches.

Definition: remote-tracking branch

A remote-tracking branch is (yet again!) just a name, but it starts out with the name of a remote, like origin, then has a slash, and then has the name of a branch "as seen on TV the remote".1 Thus, you'll see names like origin/master, which are typical remote-tracking branch names.

There is one key difference between your (regular, local) branch names and remote-tracking branches: Your Git updates your branches as you work with them: you check them out, use git commit to add commits to them, use git merge to add merge commits to them, and so on. You can git checkout a branch, and then git status will say that you are "on" the branch, e.g., on branch master.

Your Git does not update your remote-tracking branches this way. In fact, you can't get "on" them at all. Instead, when you run git fetch origin—here's where you use the remote name, origin—your Git looks up the URL from the remote, calls up another Git using that URL, and has a little conversation with it. Your Git gets, from their Git, a list of all their branches—branch names, I should say. Then your Git gets from their Git any of the commits that they have, that you don't: their branch structure.

Once your Git has their branch structure, it sets your remote-tracking branch (a name) to point to the tip-most commit, the same as their branch name does in their Git. Your Git does this for each of their branches. This way, after git fetch origin, your remote-tracking branches now keep track, for you, of where their branches were, the last time your Git caught up with their Git.

Your Git constructs your remote-tracking branch names by sticking your remote name (origin) in front of their branch names (master). That's why your remote-tracking branch is origin/master: their branch name is master.

Definition: HEAD

The name HEAD, in Git, is pretty special. (In fact, it's so special that if you manage to remove the file .git/HEAD somehow, Git will stop believing that your Git repository is a Git repository!) However, normally HEAD really just contains the name of a branch. For instance, if you're on branch master, the special HEAD file just contains the string: ref: refs/heads/master. (The string refs/heads/master is in fact the full name of branch-name master, though normally you don't need to worry about this: Git hides the refs/heads/ prefix, just like it hides the refs/remotes/ prefix when you are using the remote-tracking branch origin/master.)

When HEAD contains a branch name—which, as we just said, is the usual case—the name HEAD is mainly just shorthand for writing the current branch name. So if you're on master, HEAD is just another way to say master. It's not really all that short, but it has the advantage that it works even if you're on branch llanfairpwllgwyngyll. More importantly, it means that programs like git log don't need to know what branch you're on—or equivalently, programs like git status can find out which branch you're on. In fact, that's precisely how git status finds out.

Quick review

  • git branch lists your branch names, such as master.
  • git remote lists your remote names, such as origin.
  • git branch -r lists your remote-tracking branches, such as origin/master.

What all this means for the various git diff commands

The git diff command is itself rather unusual. Most Git commands treat branch name and revision-list arguments in the way described in the gitrevisions documentation. In git diff, though, both the two-dot and three-dot notations branch1..branch2 and branch1...branch2 are given new, different meanings.

(Besides this, git diff has a whole bunch of sub-modes, which you can invoke with git diff-index, git diff-files, and git diff-tree. But let's not worry about that here.)

You ran:

git diff HEAD...origin master

There are two extra difficulties here, and I'll completely ignore one of them for a while. The other problem is that this uses the three-dot notation, with git diff's special interpretation of it, which requires understanding the git merge-base command.

Let's simplify this second problem away for a moment by pretending that, instead, you wrote:

git diff HEAD..origin master

The special git diff interpretation of the two-dot syntax is a lot simpler: git diff pretends you didn't use the two dots at all, and instead just wrote the two names as two separate arguments. So this particular form means exactly the same thing as:

git diff HEAD origin master

There's a bit of a problem here, because we've just named three things: the special HEAD name, a name that looks like (and in fact is) a remote, rather than a branch or remote-tracking branch, and finally a branch name. The git diff command wants two things here: it wants two branch names, or at least, two arguments that it can resolve to specific commits.2

Of course, HEAD works great: it names the current branch, which names the tip-most commit on the branch. If the current branch is master and master resolve to commit 24377c8..., then HEAD also resolves to 24377c8... and Git will use 24377c8... as the first commit in the diff.

But what about origin? This is where the gitrevisions documentation comes in. It's hard to see at first, but in fact, what happens is that origin gets treated as if it reads origin/HEAD, and origin/HEAD usually3 maps to origin/master, so usually this means "whatever commit git rev-parse origin/master comes up with". It definitely always means "whatever commit git rev-parse origin comes up with."

Just for concreteness, let's say that your HEAD is your master which is commit 24377c8..., and that origin is their master which is commit b240a77.... Then you could have just typed in this:

git diff 24377c8 b240a77 master

That is, the two commits git diff will compare are these two hashes—we're using these shortened ones here because the full 40 characters is just too much—but what about that extra master?

This gets us into the other extra difficulty I mentioned earlier: git diff can take more than two commits, and if it gets three or more commits, it will often4 produce a "combined diff". If the word master were not a branch name, so that git rev-parse complained about it, git diff would have treated it as a path name, which would restrict the diff output to particular paths. But of course master is a valid branch name, so it may get parsed as a revision, and may lead to hard-to-describe behavior. (In Git version 2.8.1, where I tried it out, it acts particularly weird.)

Bottom line: don't do that

If you do want to use the three-dot form, stick with just one three-dot argument with two branch names. In this case, Git will use git merge-base to find the merge base of the two revisions. (See Drew Beres' much shorter answer to this question for details.5)

In the absence of particularly tricky forms, you can simply run git rev-parse on the names you are going to feed to git diff to see what commits it will use:

$ git rev-parse HEAD origin

This will show you two commit IDs, and those are the two commits that git diff HEAD origin or git diff HEAD..origin will compare. When using the three-dot syntax, you can run git merge-base --all to see which commit Git will choose to compare to the right-hand-side of the three-dot version. If that prints just one revision, that's the revision git diff will compare to the right-hand side.

(And, remember that all of this behavior is specific to git diff: other commands like git log treat the two-dot and three-dot syntaxes differently.)


1You can make remote-tracking branch names that do not start with remote names. You can also make local branch names that do start with remote names. Doing either of these is a bad idea as it will confuse humans. Git will keep them straight—internally, Git uses the refs/heads and refs/remotes/ prefixes to know that they are local and remote-tracking branches—but it's just impossible to work with; don't do it.

2More precisely, git diff wants to resolve the two arguments to two trees. A commit ID always works, though, and branch names resolve to commit IDs, so it probably makes more sense, at least initially, to just concern yourself with finding commits.

3When you first git clone a URL, Git sets up the remote origin to hold the URL, and also finds out—if it can—which branch HEAD names in the other Git repository. It then sets up your remote-tracking name origin/HEAD to map to your remote-tracking name for that branch. Since that branch, in that other Git repository, is usually their master, your origin/HEAD is usually a symbolic reference to your origin/master.

If their Git repository has a different branch checked-out, though, your origin/HEAD will point to some other origin/whatever name. (And, in what is probably a bug in Git, git fetch origin never actually updates your origin/HEAD, even though it probably should update it if they change their current branch.)

4It gets really bad from here because of the way git diff handles the three-dot notation. However, this depends on your specific version of Git: older versions of Git detected this with literal string tests on the arguments you passed in, and newer ones detect it by looking at flags left behind by the revision-parsing code. Without building older versions of Git, all I can say is that I am pretty sure that they behave differently from the version I tested.

5I started this answer hours ago, was interrupted several times, and found myself unable to explain why some git diff commands with three or more branch names and two and three dot syntax produced combined diffs and some produced ordinary diffs, so had to go look at the builtin/diff.c source. It's basically somewhat buggy. After poking at it for a few hours, I believe I have a fix, although since the Git maintainers have completely ignored my git stash fix, I am not hoping for much here.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
2

As per the git-diff Documentation:

Comparing branches

$ git diff topic master    (1)
$ git diff topic..master   (2)
$ git diff topic...master  (3)
  1. Changes between the tips of the topic and the master branches.

  2. Same as above.

  3. Changes that occurred on the master branch since when the topic branch was started off it.

On the triple-period <commit>...<commit> git-diff range form specifically:

This form is to view the changes on the branch containing and up to the second , starting at a common ancestor of both . "git diff A...B" is equivalent to "git diff $(git-merge-base A B) B". You can omit any one of , which has the same effect as using HEAD instead.

srage
  • 990
  • 1
  • 9
  • 27