Git merge-base like for a single file

Question

Is there any command to find a common ancestor of a file in two branches?

Say there is a file that was modified independently in two branches. I want to find the last version of that file common to both branches. I believe this boils down to finding the single parent commit for the file in both branches.

However, merge-base only allows to find a parent commit for commits, not files. I tried to specify two last commits modifying the file in their respective branches but the parent commit I got was not in the history of changes of that file in either branch, which is probably due to the fact that a commit usually contains changes to more than one file.

"merge-base only allows to find a parent commit of two commits"... you can specify more than 2 commits in the merge-base command — Robson, Nov 03 '15 at 11:40

score 5 · Accepted Answer · edited Nov 03 '15 at 14:46

Is there any command to find a common ancestor of a file in two branches?

No, or yes, or maybe: it depends on what you mean.

Say there is a file that was modified independently in two branches. I want to find the last version of that file common to both branches. I believe this boils down to finding the single parent commit for the file in both branches.

Files don't have parent commits. Only commits have parent commits.

Worse yet, every commit stores every file (every file that was part of the staging area at the time the commit was made, that is). So, in some senses, that's either every commit, or the regular ordinary merge base. Clearly that's not what you mean, so let's see what else we can say here.

Let's try a thought experiment. Suppose you have two branch-tips br1 and br2 that eventually have a common ancestor commit:

       o--o--o--Y   <-- br1
      /
...--X
      \
       o--o--o--Z   <-- br2

Consider also a somewhat more complex graph that still has a common ancestor and two branch-tips:

         o
        / \
       o   o--o--Y   <-- br1
      / \ /
...--X   o
      \
       o--o--o--Z   <-- br2

Given the way the graph is and the way git merge works, a "regular" merge (or using git merge-base) will find merge-base X, at which point I think most people will agree that some file that was in X and was propagated (perhaps with renames) to Y, and also to Z, has a common ancestor in X. This common ancestor may appear under a different path name in Y or Z (or even in both Y and Z) but it's still the common ancestor, and hence it gets used as the merge-base version.

There is a problem here though: git does not record renames. Instead, it "discovers" them every time it makes a diff. In order to discover that file generic/b.c in X is now specific/b.c in Y, git has to diff the entire tree under X against the entire tree under Y. That means it has to find commit X.

This is not too hard for a regular merge, since it uses the commit graph: it starts at both commits Y and Z and traverses history backwards to find the nearest common commit (which is of course X here). Once we know (or git knows) to use X, it makes two diffs, X-vs-Y and X-vs-Z, and then it can work on merging the changes to the contents of the common file, regardless of what path it has in Y and Z.

(There's a secondary problem with crisscross merges, where there may be multiple nearest-common-commits, but we can ignore that for now.)

If we (at least temporarily) discard the idea of finding renames, though, we can, given some path p, use a different method, which I think is what you are asking about:

For each commit cy between X and Y (including X and working backwards from Y), and each commit cz between X and Z (likewise working backwards from Z), compare cy/p and cz/p.
When these two paths' contents are equal, declare the commits to be equal.

Note that this will compare X's version of path p against X's version (which is of course the same), and also against every version along either chain of commits, while also comparing every version against every other version.

Having made this complete matrix (which we can optimize later), we can now find numerous "interesting" commits:

the last commit cy in the X-to-Y chain where p has the same contents it has in X (this is the newest commit in that chain that has p unchanged)
the last commit cz in the X-to-Z chain where p has the same contents it has in X (newest unchanged in the other chain)
the earliest cy where p has the same contents it has in commit Y (this is the last time path p was modified in the X-to-Y chain)
the earliest cz where p has the same contents it has in commit Z
any commits in either chain that have the same contents for p as any commits in the other chain.

I think possibly you're thinking about finding items 1 and 2 here. It's not clear why, though. If you only care about the contents stored under path p, we've already established (above) that these two commits store the same content under p as you find in X. So X:p is "just as good" at identifying those contents, and you might as well use commit X.

If you're talking about finding items 3 and 4, again it's not really clear why, because we've established that these have the same contents for p as their tip-most commits, so Y:p and Z:p are just as useful for identifying those contents.

But maybe you're working with item 5: commits on the two chains where the contents under path p are the same (as the other commit on the other chain), but not necessarily the same as the contents in the tip-most commits.

There can be many such pairs. For instance, suppose that in X (the definitely-common ancestor that git merge-base finds), path p has five lines. Then, in progressing towards Y, the first commit in that path deletes the last line. Meanwhile in the X-to-Z sequence, several commits keep all 5 lines, then one deletes the last line. Now this version of p is the same in both lines of development, until the next commit that modifies p. Let's say that's in the X-to-Z sequence where another line is deleted. Then in the X-to-Y sequence, that same line gets deleted; then later, both commits delete more lines, until finally the file is completely empty at one or both branch tips.

There's also another problem with defining "nearest". Let's look at the more complex X-to-Y graph fragment again, but put in a few more distinguishing letters:

         R
        / \
       P   T--o--Y   <-- br1
      / \ /
...--X   S

Suppose that path p has the same contents in commits R and S, but different in both P and T. Both are the same graph-distance from either X or Y. As long as you only care about path p, this is probably irrelevant, but it does show that there's not necessarily a unique commit.

That's a lot of verbiage before I get down to a few commands you would want to use, in order to solve whatever it is you're trying to solve.

The command that will get you closer to a solution (maybe even all the way there, depending on what it is you want, although it seems likely you'll need to use additional commands, some not even git commands) is git rev-list. This can find commits in which particular paths were modified (as compared to those commits' parent(s); note that merges have to be handled specially, in general, since they have multiple parent commits). If you do use one or more paths to limit the revisions listed by git rev-list, note that it will perform "history simplification" so as to omit some commits from its output. Depending on how you want DAG-level branches (like those in the more complex X-to-Y chain) handled, this may be what you want anyway.

Basically, git rev-list X..Y -- path will find commits reachable from Y, excluding those reachable from X, that modify path, where "modify" means "a diff against the parent shows a change to that path". (For how this handles merges, well, see the documentation.) The order in which commits are listed depends on the sorting you choose (with or without topological constraints; see the "Commit Ordering" section).

If you repeat this with X..Z, you can find which commits modified the path there.

These two git rev-lists are essentially walking the entire revision chain from X to the two branch-tips, but because they let you limit their output to "commits that modify some path(s)", they can optimize the process I outlined in the thought-experiment.

You might want to include commit X here. By default, rev-list won't: you can either start one commit earlier (at a parent of X), but this could misfire if X itself is a merge; or you can use --boundary, which directs rev-list to include commit X's SHA-1 (prefixed by -).

To find out whether the contents stored under a particular path are the same in two different commits—obviously the contents are the same if you use the same commit ID twice here, but it will still work—you can compare the stored blob's SHA-1 ID:

path=dir/file
...
rev_a=...   # something from git rev-list, for instance
rev_b=...
if [ $(git rev-parse ${rev_a}:${path}) = $(git rev-parse ${rev_b}:${path} ]; then
    ... the contents match ...
else
    ... the contents differ (at least slightly) ...
fi

None of these will detect renames; for that, you must use a full-blown git diff (with rename-detection turned on).

Thank you for very comprehensive response! To answer your question, I was doing a merge and git showed a conflict in a file because some parts of that file were modified in both branches. However, git blame didn't show those parts as modified since the merge-base commit between those branches. So I wanted to find the commit X (in which the file was the same in both branches) in order to review all commits modifying that file between X and the latest commit in each of those two branches. Then re-apply those changes manually to resolve the conflict. — Greg, Nov 03 '15 at 16:33
In that case, you might want to look into setting `merge.conflictstyle = diff3`. See http://stackoverflow.com/questions/27417656/should-diff3-be-default-conflictstyle-on-git but also http://stackoverflow.com/questions/16990657/git-merge-diff3-style-need-explanation — torek, Nov 03 '15 at 17:43
Thanks for the hint, very useful. I didn't know about diff3 until now :) — Greg, Nov 03 '15 at 21:06

Git merge-base like for a single file

1 Answers1