Git Merge Erroneously Identifies Conflicts in Blocks

Question

I have in my repository a single file, data.csv, which represents a CSV-formatted database. For the sake of example, let's suppose the contents of data.csv are

1,2,3
2,3,4
4,5,6

Originally, I only have the master branch and I create two branches A and B, where I modify data.csv independently. I've noticed that sometimes, the 3-way diff algorithm identifies conflicts that in my eyes, shouldn't be conflicts at all. For example, if A modifies the file to be

1,4,5
2,3,4
4,5,6

and B modifies the file to be

1,2,3
2,6,7
4,5,6

When I issue git merge A from branch B, instead of auto-merging these versions, it actually reports the following conflict:

<<<<<<< HEAD
1,2,3
2,6,7
=======
1,4,5
2,3,4
>>>>>>> A
4,5,6

But it seems to me that actually these versions should be auto-mergeable with the 3-way diff logic on a line-by-line level, since A only modifies the first line, and B only modifies the second.

My Questions: Why does this happen? And is there a way to force Git to do a more fine-grained diff (e.g. line-by-line)? (Or alternatively, are there any ways to force Git to realize that these changes are actually auto-mergeable?)

Does this answer your question? [Why do changing adjacent lines but modifying independently cause a git merge conflict?](https://stackoverflow.com/questions/55275340/why-do-changing-adjacent-lines-but-modifying-independently-cause-a-git-merge-con) — matt, Dec 21 '20 at 06:47
@matt Thanks for the link. It answers my question of why this happens, but I'm wondering if there might be a way to force git to change its algorithm to examine these changes line-by-line. For example, I noticed that one can specify the `diff-algorithm` parameter of `git merge`. Could this help? — paulinho, Dec 21 '20 at 15:44
While the diff algorithm defines the *range* of each change, it's the *merge strategy* that chooses how to combine these ranges. It's theoretically possible to write a new merge strategy, but this is very difficult: Git is getting a new merge strategy now (this year or next year, probably) for the first time in almost 20 years. It's also possible to write a *merge driver* to use with the existing default merge strategy, which is a lot more realistic. That would be the way to go for your particular case. — torek, Dec 21 '20 at 15:57
@paulinho I wonder if the [new Git 2.30 Q1 2021 ORT merge strategy](https://stackoverflow.com/a/64950077/6309) would change anything here ("Ostensibly Recursive's Twin"). — VonC, Dec 21 '20 at 15:58

score 2 · Accepted Answer · answered Dec 21 '20 at 16:16

As I mentioned in a comment, the way you could handle this today is to write a merge driver. Writing a good merge driver is not trivial, but you will be able to experiment with it, and apply it only to specific files.

If you don't define a merge driver yourself, Git uses its own built-in one. This built-in one is mostly identical to the git merge-file command. (It might be exactly identical to it, since these are built from various shared source files in Git. Note that the built in "low level" merge driver in ll-merge.c is where the choice of running a configured merge driver, or using the built in code, actually happens.)

Note that your merge driver needs, at a minimum, three inputs (you can give it up to five inputs):

a path name in which the driver can find the merge base version of the file;
a path name in which the driver can find the current (--ours) version of the file, and to which the driver must write the final, merged version of the file; and
a path name in which the driver can the other (--theirs) version of the file.

The driver's job is to read the three input versions, however it chooses, and then to write the correct merge result, obtained however it likes, to the middle one of these three path names. The path names will be the names of temporary files: do not assume that any of these three file names makes any sense or has any relationship to the historical names of the files being merged.

The extra data you can pass to your own program include the user's desired conflict marker size (default 7) and the path name to which the merge result will eventually be copied. That is, suppose we're merging a file whose name in the merge base is orig.wrongsuffix, whose name in the --ours commit is ours.csv, and whose name in the the --theirs commit is renamed-wrongly.csv. The three input files will likely have file names of the form .git-tmp-1234567 or similar. Given the existing recursive or resolve strategies, the driver's output will eventually wind up in a file named ours.csv, though because there is a rename/rename conflict (we fixed the name, and they tried to fix the name), the merge will stop with a conflict even though our merge driver will be able to produce a merged result.

To indicate a successful merge—i.e., that the merge does not have to stop due to conflicts found by your own merge driver—your merge driver should return a successful exit status when it terminates. In other words, from C code, call exit(0); from Python, use sys.exit(0) or equivalent; from Go, use os.Exit(0); and so on. To indicate that, despite your driver's best efforts, your code was unable to produce the correct merge result—and therefore may or may not have left merge conflict markers in its output file—supply a nonzero exit status (preferably a small nonzero value such as 1; there are a few special values around 125-127 for use in things like git bisect that might be treated specially in other parts of Git as well; for traditional Unix programming reasons, values should not exceed 127).

To tell Git to use your merge driver, you need to do two things:

create a .git/config or $HOME/.gitconfig or other entry that defines the driver, telling Git how to run it;
create a .gitattributes entry (creating the file first if needed) telling Git to use your driver on this particular .csv file, for instance.

The instructions for defining these are in the gitattributes documentation.

score 0 · Answer 2 · answered Dec 21 '20 at 16:46

The overlapping-or-abutting rule is there for a reason. You can find cases where it's not needed, but, yay for dvcs's, if you pull say the linux history and rerun all the merges in the last fifteen years with an automerge-abutting-changes rule you'll find it produces very bad results in a lot of cases. No rule can be perfect, you have to draw the line somewhere, overlapping-or-abutting is the one that kicks up the minimum unnecessary fuss while close-enough-to-never making blameworthy mistakes in practice.

score -2 · Answer 3 · answered Dec 21 '20 at 09:04

-2

Merge conflict always happens when you merge two branches that have modified the same file. In the example, you got merge conflict because branch A has modified data.csv and branch B has also modified data.csv. To resolve this conflict, you have to decide which lines you want to keep and which ones to delete between <<<<<<< HEAD and >>>>>>> A. Also, you have to delete <<<<<<< HEAD, =======, and >>>>>>> A.
After that run git add data.csv command to resolve the conflict and then run git commit to conclude merge.

answered Dec 21 '20 at 09:04

Mohit Natani

52
1

Hi @MohitNatani, my understanding is that it's not always the case that modifying the same file will cause a merge conflict. The files have to be modified on the same line in different ways as well. See my comment on the original post to see what I'm still wondering, and if you know the answer to those questions, I would appreciate it if you could update your answer! – paulinho Dec 21 '20 at 15:54

Git Merge Erroneously Identifies Conflicts in Blocks

3 Answers3