10
           :
           A
T         / \
i        B   C
m        :   :
e        D   E
          \ /
|          F
V          :

git merge-base B E allows to find where a the common ancestor A of the two commits. Is there a way to find the commit F where the two branches are merged again?

svick
  • 236,525
  • 50
  • 385
  • 514
bara
  • 2,964
  • 2
  • 26
  • 24
  • thats an awesome graph, how did u make that? – Alex Gordon Jun 06 '12 at 21:05
  • 1
    What if there is more than one “first” common child? For example, there could be a commit that is a merge of B and E. – svick Jun 06 '12 at 21:31
  • 1
    You could hack up the `git rev-list --children` output something like this guy did: http://stackoverflow.com/questions/1761825/referencing-the-child-of-a-commit-in-git. – ellotheth Jun 06 '12 at 22:27
  • i've made the graph by hand using vim. @svick So there can be multiple common children with no defenitive order. – bara Jun 07 '12 at 13:04
  • @bara: Ah, so the trick is to use vim. – HelloGoodbye Jul 02 '15 at 09:28
  • @HelloGoodbye you should be able to do it by hand with any editor where you can set a monospace font. For complex branches it is more efficient to use `git log --graph [--oneline --decorate...]` or external tools. – Narcolessico Jul 22 '21 at 16:58

3 Answers3

2

There isn't necessarily a unique answer to this problem, so you have to decide on a few constraints and/or heuristics, or accept the possibility more than one "downstream" merge. The heart of the problem is the same as the problem of multiple merge base candidates—use git merge-base --all to list them all, otherwise Git just picks whichever one pops up first in its algorithm. We can do the same, or find all best merge candidates.

You've drawn what I usually prefer to render sideways as, e.g.:

  B--...--D
 /         \
A           F--G--H   <-- branch1
 \         /
  C--...--E   <-- branch2

but we might have this:

  B--C---D--E--...   <-- branch1
 /    \ /
A      X
 \    / \
  F--G---H--I--...   <-- branch2

In this case both merges D and H are equally good candidates for "the place where the branches re-merge" if you allow both branch1 and branch2 to be considered. Even if you don't, if branch2 merges back into branch1 later:

  B--C---D--E---J--...   <-- branch1
 /    \ /      /
A      X      /
 \    / \    /
  F--G---H--I--...   <-- branch2

then just starting from (or ending at) branch1, both D and H are equally good candidates.

In any case, what we need here is to enumerate commits that end in one or all of the branches you want to consider. To do that, we can use, e.g.:

git rev-list --ancestry-path ^B ^E branch1 branch2

This finds commits that are ancestors of branch1 or branch2, and are also descendants of commit B or of commit E.

To really get the right answer, we want to add --children. That way we'll get the hash ID of each commit, along with the children of that commit that go in this same direction. Git achieves the --children by reversing the backwards connections from the children to the parents as it traverses the links, which is good enough; but we won't see commits B or E. This is kind of a problem. To get them shown, we can add --boundary. This is not ideal: --boundary sometimes includes some commits we don't want. Fortunately, they're all marked with - so we can exclude extra boundary commits by knocking out ones that aren't the commits we care about.

I'm not going to show any of that, but if you did that, you would now have a list, one entry per line, of each node (vertex) and its edges that connect to its children. You can now ask What is the LCA of the DAG formed by these (V,E) sets?

It would be nice if we could just use Git's LCA algorithm, but Git does not have a way to invoke it on arbitrary graphs—we can only invoke it on commits, and the actual commits have parents, not children. So you will have to write your own. See Algorithm to find lowest common ancestor in directed acyclic graph? (which, unfortunately, has no accepted answer). This algorithm looks correct at first blush; it has one of the two standard definitions for LCA in a graph.

If we're willing to settle for a not-nearly-as-good answer, though, we can get something that's probably sufficient in most cases by adding --topo-order (to make sure parents come out after all their children) and --merges (to omit everything that's not a merge commit). This will get a list of all merges.

I have made here a test repository with a simple case:

$ git log --all --decorate --oneline --graph
* 91fcef6 (HEAD -> master) J
* d1e5905 I
*   5bf18a0 merge
|\  
| * 49b2ba7 (sidebr) D
| * 725e5ea C
| * 36b830d (tag: B) B
* | 198a982 (tag: G) G
* | 216bc01 F
* | e905e59 E
|/  
* 5df9428 initial

So I can now name commits B and G using B and G, and the branch I want for a "move in this direction" is just master. So:

$ git rev-list --topo-order --merges --ancestry-path ^B ^G master
5bf18a0797dfd78107928a9a4095f357cfabe914

The last line here is the merge that's "closest" to the two commits. In this case, that's also the only line, and that's the merge we want.

The flaw here is clear enough once we draw it. Suppose I had a more complex graph, such as:

      I--J
     /    \
    H      M--N
   / \    /    \
  /   K--L      \
 /               \
A                 P--Q  <-- master
 \               /
  \   C--D      /
   \ /    \    /
    B      G--O
     \    /
      E--F

If I now run git rev-list --topo-order --merges --ancestry-path ^B ^H master, I'll enumerate commit P, then both G and M in some order. So the last line will either be commit G or commit M, and while both of these are merges, they don't meet the right criterion: they don't merge B and H. Only commit P does that.

Hence, to check whether you have a right answer—without handling the multiple LCA issue—you should take each of the output lines from this git rev-list command, probably in reverse order (consider adding --reverse), and see if both commits are ancestors of each. "Internal" merges like G and M will have only one commit as an ancestor. To do the is-ancestor test, use git merge-base --is-ancestor:

if git merge-base --is-ancestor $commit1 $mergecommit &&
       git merge-base --is-ancestor $commit2 $mergecommit; then
    ... we've found a correct candidate
else
    ... move on to another candidate
fi
torek
  • 448,244
  • 59
  • 642
  • 775
  • I've used your code here: https://gist.github.com/CMCDragonkai/bfb4529f2384948c45f8b1f97b237f69. But I've found that when using `--reverse`, it's only the last line that ended up succeeding. All the previous possible merge commits was not a descendant of both commits. And I had 3000 possible merge commits. – CMCDragonkai Apr 09 '20 at 07:30
  • Just a note, this only finds common descendants that are merge commits. If we want to find the common descendant even in the case where they are not merge commits so it can work arbitrarily for any 2 commits. Then I suppose we just remove the `--merges` flag? – CMCDragonkai Apr 09 '20 at 08:01
  • @CMCDragonkai: it's not clear to me what you mean by "common child" if the two commits you're starting from are not themselves on different "lines", as it were. For instance, are you asking about the "first common child" of commits `K-L`? That would be commit `L` itself, as `L` is a descendant of `K`. If you wish to allow this sort of thing, just check whether `K` and/or `L` are ancestors of the other of the pair; if so, whichever one is *not* an ancestor of the other, is the common child. – torek Apr 09 '20 at 16:44
  • If the commits *are* on different lines, and you drop the `--merges` from the cheap-and-sleazy `git rev-list ... --merges` option, you'll pick one of the descendants of the two commits. For instance, given the above graph, and commits `K` and `I`, you'd find commit `J` (child of `I`) or `L` (child of `K`), which do not satisfy the "can reach both commits" test. However, adding the `git merge-base --is-ancestor` test would knock that out, and then also the other one, and would therefore find commit `M` after all. – torek Apr 09 '20 at 16:51
  • The problem with just leaving out `--merges` from the `git rev-list` is that with `K` and `L` as inputs it would not list `L` at all as we knock them out. To fix *that*, we'd need to avoid knocking out the two commits in question, and knock out only their *parents*. That is easy enough: use `^{hash}^@`; see the gitrevisions documentation. – torek Apr 09 '20 at 16:53
  • It's probably better to just test the two commits up front though. There are likely far fewer merges in the graph than there are overall commits, so given arbitrary commits C1 and C2, we test to see if C1 and/or C2 are directly related by ancestry (two tests), then if not, list all merges find-able in the direction of some branch tip(s) that are descendants of both C1 and C2; then test only those merges (say, 5 commits out of 10,000 commits). – torek Apr 09 '20 at 16:55
1

Oops. Didn't read that carefully enough.

The only information in a commit is the id of its parent (or parents). You cannot get to a child from a parent commit (this is the directed part of the repository being a DAG).

Looking at this more - it looks like the --ancestry-path option for git log can do this. For instance given:

* 85d26ab When compiling vim, also compile & install gvim
*   3146e5d Merge remote-tracking branch 'origin/devel' into deve
|\
| * 28d08e5 rebasing-merge: specify all commits explicitly
* | 006d11d Help 'file' find its magic file
|/
* e68531d (tag: Git-1.7.6-preview20110720) Update submodules

we can get the all children of these two commits using

git log --oneline --ancestry-path B..E

if you then reverse this and pick off the first one -- that is F.

git rev-list --reverse --ancestry-path 28d08e5..006d11d | head -1

in my case that returns 3146e5d.

patthoyts
  • 32,320
  • 3
  • 62
  • 93
  • 1
    This seems unlikely to work: `--ancestry-path` limits the output to commits that are both ancestors of the right side hash (`006d11d`) and descendants of the left-side hash (`28d08e5`), so since `3146e5d` is *not* an ancestor of `006d11d` it should not be printed. It's true that `git rev-list --ancestry-path` can be told to reverse the directions of the parent<-child links it traverses, but that's not sufficient. – torek Apr 28 '19 at 00:16
0

Adapt all.awk from this answer to also carry the line number for each ref, then when you've encountered both parents look at the refs they have in common.

Community
  • 1
  • 1
jthill
  • 55,082
  • 5
  • 77
  • 137