It may help to re-draw this:
feature/X C--E--F
/ \
master -A--B--D---G--H--I--J->
as this:
C--E--F <-- feature/X
/ \
<--A--B--D---G--H--I--J <-- master
The reason is that the arrows really do point backwards, with feature/X
pointing to the tip commit of branch feature/X
, i.e., to commit F
, and master
pointing to the tip commit of master
(which I've assumed is J
here, though maybe there are more given your original drawing).
As you've noted, feature/X ^master
(which can also be spelled master..feature/X
) fails because commit F
is reachable from master
by starting at the commit to which master
points (J
) and working backwards. When we hit commit H
we work backwards through both parents simultaneously, so the request to eliminate all commits reachable from master
also eliminates the C--E--F
sequence.
To stop that from happening, we must eliminate commits starting from some point before H
, i.e., a point before the first merge that brings the tip of feature/X
into master
. Any of commits G
, D
, or B
will suffice. That is, if we had the hash of any one of these commits, then:
git rev-list feature/X ^$hash
would do the trick.
qzb's method finds commit D
and then uses a suffix ^
to identify its first and only parent. It works by listing every commit reachable from J
(the tip of master
) that is not also reachable from F
(the tip of feature/X
). There is a caveat: git rev-list
may sort commits, so that D
may not actually be listed last, but the | tail -1
assumes that the listing ends with commit D
's hash.
This therefore depends on the date-stamps stored in the commits. If they were made in order (so that the dates all increase as the commits move forward in time), that's not a problem. Usually they do. But sometimes you can add commits in the "wrong" date order, due to clocks being set incorrectly, or commits being done on different computers that disagree as to what time it is, or whatever.
We can fix the date assumption by telling git rev-list
to use --topo-order
, which forces it to list commits in graph order (using a partial order from the graph topology). So when using this method, add --topo-order
.
Noufal Ibrahim's method works by finding commit H
instead, using git log
. It's a bit better to use git rev-list
, which takes the same options as git log
but just prints the hash (which is all we want):
H=$(git rev-list --merges -1 master)
# H stands for Hash, and also for "commit H" :-)
(note that we must specify a starting point for the graph walk, while git log
defaults to starting from HEAD
). Obtaining the hash for commit H
is not quite sufficient since we must then climb one parent back. Since H
has two parents, we must carefully climb from H
to G
(not to F
).
Fortunately, whenever we merge with git merge
, Git makes sure that the first parent of the new merge commit is the commit that was on the current branch. That is, when we made commit H
by running git merge feature/X
, we were on branch master
and the name master
meant commit G
. So the first parent of H
is G
, hence $H^1
, or just $H^
, identifies commit G
:
H=$(git rev-list --merges -1 master)
git rev-list feature/X ^${H}^
The curly braces around H
are not technically necessary, just meant for clarity: we expand $H
and then put ^
after the expansion (to identify commit G
), and another ^
in front of the expansion (to tell git rev-list
that we're using this as an exclusion specifier).
Since $yes ^$no
can be written as $no..$yes
instead, we can also write this as:
H=$(git rev-list --merges -1 master)
git rev-list ^${H}^..feature/X
This method is a bit more efficient (we enumerate just the one commit H
, rather than using tail -1
to get the last commit of some potentially long chain) and does not suffer from date-order issues (but we saw above how to fix those with --topo-order
).
Incidentally, this too really should use --topo-order
when finding commit H
, for the same reason: we don't want Git to sort and put some other merge (something before A
, for instance) in front of H
.
The remaining flaws
qzb noted one of them: while feature/X
points to commit F
, if there are more merges in the past, we don't necessarily "know where to stop". That is:
o--o---o--o--o <-- feature
/ \ / \
...--o--o--o---o--o--o---o <-- master
By drawing this particular graph in this particular way, it's clear to us that all the commits along the "top line" are those that were done on feature
, and that feature
was merged into master
twice, while master
was merged back into feature
once. (Incidentally this sort of "cross merging" can get you into trouble. It's not wrong, but in general you should be careful about merging A into B and B into A. In some cases this produces multiple merge bases for merges, which can be tricky.) But it's not clear to Git, and there are other ways to draw the graph that will obscure it from our own eyes as well. (Moreover, if you ever allow "fast forward" merges (rather than a non-fast-forward, actual merge commit, merges), untangling branch history becomes impossible in general. Again, it's not wrong, you just need to be prepared to deal with it.)
A more important issue occurs with both methods if there is a merge on master
past commit H
. That is, suppose that the lettered graph we've been drawing so far is still a bit misleading, and in fact it should look like this:
C--E--F <-- feature/X
/ \
<--A--B--D---G--H--I--J <-- master
/
<-o--o--o <-- feature/Y
Now if we do:
H=$(git rev-list --topo-order --merges -1 master)
we will wind up setting $H
to point to commit I
, rather than commit H
. The reason is simple: we asked for the most recent (topologically) commit starting from master
and working backwards, that is also a merge commit. That's commit I
. But I^
is commit H
and excluding H
will make the subsequent git rev-list
exclude commits C--E--F
.
That seems to doom this approach; can we go back to locating commit D
? No, because qzb's trick:
$(git rev-list feature/X..master | tail -n 1)
stops working when Git races down the second parent of I
, i.e., through feature/Y
, and begins listing all those commits. Without --topo-order
, we get the oldest commit. With --topo-order
we are still not told which chain (I^1
vs I^2
) is handled first. If that chain connects back at commit A
or earlier, we may get the hash for commit A
-or-earlier, instead of that for commit D
.
We could fix that by noting the additional merge I
that brings in feature/Y
, and excluding feature/Y
so that Git does not race down that chain. But this begins to get complicated. What we really need, then, is not the most recent merge, but rather the merge that brings in commit F
(i.e., "find me commit H
"). Is there a way to get that? As it turns out, there is. What we want here is --ancestry-path
.
The --ancestry-path
option strips out commits that are not descendants of an excluded commit. Since feature/X
is merged into master
, we know for certain that there is some commit (actually H
, of course) after F
that is a descendant of F
—i.e., F
is one of its parents—and also is an ancestor of master
. So:
git rev-list --ancestry-path --topo-order ^feature/X master
tells Git to list out commits H
, I
, and J
, and no other commits. That is, we won't go racing down the other commits brought in by merge I
: those commits will get pruned.
If we then discard all but the last commit (with tail -1
again), and optionally speed things up a bit with --merges
to discard any non-merges even before using tail
, that will let us locate commit H
even if I
or J
is a merge:
H=$(git rev-list --ancestry-path --topo-order \
--merges ^feature/X master | tail -1)
git rev-list ^$H^..feature/X
This is a hybrid of the two methods: we use --ancestry-path
to find commits starting from H
, and tail -1
to drop all but commit H
, then use ^$H^
to exclude commit-G
-and-earlier.