This is not exactly a real answer, but I need access to formatting, and a lot of space. I'll try to describe the theory behind what I consider the two best answers: the accepted one and the (at least currently) top-ranked one. But in fact, they answer different questions.
Commits in Git are very often "on" more than one branch at a time. Indeed, that's much of what the question is about. Given:
...--F--G--H <-- master
\
I--J <-- develop
where the uppercase letters stand in for actual Git hash IDs, we're often looking for only commit H
or only commits I-J
in our git log
output. Commits up through G
are on both branches, so we'd like to exclude them.
(Note that in graphs drawn like this, newer commits are towards the right. The names select the single right-most commit on that line. Each of those commits has a parent commit, which is the commit to their left: the parent of H
is G
, and the parent of J
is I
. The parent of I
is G
again. The parent of G
is F
, and F
has a parent that simply isn't shown here: it's part of the ...
section.)
For this particularly simple case, we can use:
git log master..develop # note: two dots
to view I-J
, or:
git log develop..master # note: two dots
to view H
only. The right-side name, after the two dots, tells Git: yes, these commits. The left-side name, before the two dots, tells Git: no, not these commits. Git starts at the end—at commit H
or commit J
—and works backwards. For (much) more about this, see Think Like (a) Git.
The way the original question is phrased, the desire is to find commits that are reachable from one particular name, but not from any other name in that same general category. That is, if we have a more complex graph:
O--P <-- name5
/
N <-- name4
/
...--F--G--H--I---M <-- name1
\ /
J-----K <-- name2
\
L <-- name3
we could pick out one of these names, such as name4
or name3
, and ask: which commits can be found by that name, but not by any of the other names? If we pick name3
the answer is commit L
. If we pick name4
, the answer is no commits at all: the commit that name4
names is commit N
but commit N
can be found by starting at name5
and working backwards.
The accepted answer works with remote-tracking names, rather than branch names, and allows you to designate one—the one spelled origin/merge-only
—as the selected name and look at all other names in that namespace. It also avoids showing merges: if we pick name1
as the "interesting name", and say show me commits that are reachable from name1
but not any other name, we'll see merge commit M
as well as regular commit I
.
The most popular answer is quite different. It's all about traversing the commit graph without following both legs of a merge, and without showing any of the commits that are merges. If we start with name1
, for instance, we won't show M
(it's a merge), but assuming the first parent of merge M
is commit I
, we won't even look at commits J
and K
. We'll end up showing commit I
, and also commits H
, G
, F
, and so on—none of these are merge commits and all are reachable by starting at M
and working backwards, visiting only the first parent of each merge commit.
The most-popular answer is pretty well suited to, for instance, looking at master
when master
is intended to be a merge-only branch. If all "real work" was done on side branches which were subsequently merged into master
, we will have a pattern like this:
I---------M---------N <-- master
\ / \ /
o--o--o o--o--o
where all the un-letter-named o
commits are ordinary (non-merge) commits and M
and N
are merge commits. Commit I
is the initial commit: the very first commit ever made, and the only one that should be on master that isn't a merge commit. If the git log --first-parent --no-merges master
shows any commit other than I
, we have a situation like this:
I---------M----*----N <-- master
\ / \ /
o--o--o o--o--o
where we want to see commit *
that was made directly on master
, not by merging some feature branch.
In short, the popular answer is great for looking at master
when master
is meant to be merge-only, but is not as great for other situations. The accepted answer works for these other situations.
Are remote-tracking names like origin/master
branch names?
Some parts of Git say they're not:
git checkout master
...
git status
says on branch master
, but:
git checkout origin/master
...
git status
says HEAD detached at origin/master
. I prefer to agree with git checkout
/ git switch
: origin/master
is not a branch name because you cannot get "on" it.
The accepted answer uses remote-tracking names origin/*
as "branch names":
git log --no-merges origin/merge-only \
--not $(git for-each-ref --format="%(refname)" refs/remotes/origin |
grep -Fv refs/remotes/origin/merge-only)
The middle line, which invokes git for-each-ref
, iterates over the remote-tracking names for the remote named origin
.
The reason this is a good solution to the original problem is that we're interested here in someone else's branch names, rather than our branch names. But that means we've defined branch as something other than our branch names. That's fine: just be aware that you're doing this, when you do it.
git log
traverses some part(s) of the commit graph
What we're really searching for here are series of what I have called daglets: see What exactly do we mean by "branch"? That is, we're looking for fragments within some subset of the overall commit graph.
Whenever we have Git look at a branch name like master
, a tag name like v2.1
, or a remote-tracking name like origin/master
, we tend to want to have Git tell us about that commit and every commit that we can get to from that commit: starting there, and working backwards.
In mathematics, this is referred to as walking a graph. Git's commit graph is a Directed Acyclic Graph or DAG, and this kind of graph is particularly suited for walking. When walking such a graph, one will visit each graph vertex that is reachable via the path being used. The vertices in the Git graph are the commits, with the edges being arcs—one-way links—going from each child to each parent. (This is where Think Like (a) Git comes in. The one-way nature of arcs means that Git must work backwards, from child to parent.)
The two main Git commands for graph-walking are git log
and git rev-list
. These commands are extremely similar—in fact they're mostly built from the same source files—but their output is different: git log
produces output for humans to read, while git rev-list
produces output meant for other Git programs to read.1 Both commands do this kind of graph-walk.
The graph walk they do is specifically: given some set of starting point commits (perhaps just one commit, perhaps a bunch of hash IDs, perhaps a bunch of names that resolve to hash IDs), walk the graph, visiting commits. Particular directives, such as --not
or a prefix ^
, or --ancestry-path
, or --first-parent
, modify the graph walk in some way.
As they do the graph walk, they visit each commit. But they only print some selected subset of the walked commits. Directives such as --no-merges
or --before <date>
tell the graph-walking code which commits to print.
In order to do this visiting, one commit at a time, these two command use a priority queue. You run git log
or git rev-list
and give it some starting point commits. They put those commits into the priority queue. For instance, a simple:
git log master
turns the name master
into a raw hash ID and puts that one hash ID into the queue. Or:
git log master develop
turns both names into hash IDs and—assuming these are two different hash IDs—puts both into the queue.
The priority of the commits in this queue is determined by still more arguments. For instance, the argument --author-date-order
tells git log
or git rev-list
to use the author timestamp, rather than the committer timestamp. The default is to use the committer timestamp and pick the newest-by-date commit: the one with the highest numerical date. So with master develop
, assuming these resolve to two different commits, Git will show whichever one came later first, because that will be at the front of the queue.
In any case, the revision walking code now runs in a loop:
- While there are commits in the queue:
- Remove the first queue entry.
- Decide whether to print this commit at all. For instance,
--no-merges
: print nothing if it is a merge commit; --before
: print nothing if its date does not come before the designated time. If printing isn't suppressed, print the commit: for git log
, show its log; for git rev-list
, print its hash ID.
- Put some or all of this commit's parent commits into the queue (as long as it isn't there now, and has not been visited already2). The normal default is to put in all parents. Using
--first-parent
suppresses all but the first parent of each merge.
(Both git log
and git rev-list
can do history simplification with or without parent rewriting at this point as well, but we'll skip over that here.)
For a simple chain, like start at HEAD
and work backwards when there are no merge commits, the queue always has one commit in it at the top of the loop. There's one commit, so we pop it off and print it and put its (single) parent into the queue and go around again, and we follow the chain backwards until we reach the very first commit, or the user gets tired of git log
output and quits the program. In this case, none of the ordering options matter: there is only ever one commit to show.
When there are merges and we follow both parents—both "legs" of the merge—or when you give git log
or git rev-list
more than one starting commit, the sorting options matter.
Last, consider the effect of --not
or ^
in front of a commit specifier. These have several ways to write them:
git log master --not develop
or:
git log ^develop master
or:
git log develop..master
all mean the same thing. The --not
is like the prefix ^
except that it applies to more than one name:
git log ^branch1 ^branch2 branch3
means not branch1, not branch2, yes branch3; but:
git log --not branch1 branch2 branch3
means not branch1, not branch2, not branch3, and you have to use a second --not
to turn it off:
git log --not branch1 branch2 --not branch3
which is a bit awkward. The two "not" directives are combined via XOR, so if you really want, you can write:
git log --not branch1 branch2 ^branch3
to mean not branch1, not branch2, yes branch3, if you want to obfuscate.
These all work by affecting the graph walk. As git log
or git rev-list
walks the graph, it makes sure not to put into the priority queue any commit that is reachable from any of the negated references. (In fact, they affect the starting setup too: negated commits can't go into the priority queue right from the command line, so git log master ^master
shows nothing, for instance.)
All of the fancy syntax described in the gitrevisions documentation makes use of this, and you can expose this with a simple call to git rev-parse
. For instance:
$ git rev-parse origin/pu...origin/master # note: three dots
b34789c0b0d3b137f0bb516b417bd8d75e0cb306
fc307aa3771ece59e174157510c6db6f0d4b40ec
^b34789c0b0d3b137f0bb516b417bd8d75e0cb306
The three-dot syntax means commits reachable from either left or right side, but excluding commits reachable from both. In this case the origin/master
commit, b34789c0b
, is itself reachable from origin/pu
(fc307aa37...
) so the origin/master
hash appears twice, once with a negation, but in fact Git achieves the three-dot syntax by putting in two positive references—the two non-negated hash IDs—and one negative one, represented by the ^
prefix.
Simiarly:
$ git rev-parse master^^@
2c42fb76531f4565b5434e46102e6d85a0861738
2f0a093dd640e0dad0b261dae2427f2541b5426c
The ^@
syntax means all the parents of the given commit, and master^
itself—the first parent of the commit selected by branch-name master
—is a merge commit, so it has two parents. These are the two parents. And:
$ git rev-parse master^^!
0b07eecf6ed9334f09d6624732a4af2da03e38eb
^2c42fb76531f4565b5434e46102e6d85a0861738
^2f0a093dd640e0dad0b261dae2427f2541b5426c
The ^!
suffix means the commit itself, but none of its parents. In this case, master^
is 0b07eecf6...
. We already saw both parents with the ^@
suffix; here they are again, but this time, negated.
1Many Git programs literally run git rev-list
with various options, and read its output, to know what commits and/or other Git objects to use.
2Because the graph is acyclic, it's possible to guarantee that none have been visited already, if we add the constraint never show a parent before showing all of its children to the priority. --date-order
, --author-date-order
, and --topo-order
add this constraint. The default sort order—which has no name—doesn't. If the commit timestamps are screwy—if for instance some commits were made "in the future" by a computer whose clock was off—this could in some cases lead to odd looking output.
If you made it this far, you now know a lot about git log
Summary:
git log
is about showing some selected commits while walking some or all of some part of the graph.
- The
--no-merges
argument, found in both the accepted and the currently-top-ranked answers, suppresses showing some commits that are walked.
- The
--first-parent
argument, from the currently-top-ranked-answer, suppresses walking some parts of the graph, during the graph-walk itself.
- The
--not
prefix to command line arguments, as used in the accepted answer, suppresses ever visiting some parts of the graph at all, right from the start.
We get the answers we like, to two different questions, using these features.