Git merge internals

Question

This is probably going to end up being a long question, so please bear with me.

I came across an incredible explanation for git merge decisions here: How does git merge work. I am trying to build on this explanation and see if there are any holes in depicting git merge this way. Essentially, the decision of whether or not a line shows up in the merged file can be depicted by a truth table:

W: original file, A: Alice's branch, B: Bob's branch

truth-table

Based on this truth table, it is straightforward to think up a line based algorithm to construct D: Construct D line-by-line by looking at corresponding lines from A and B and making a decision based on truth-table.

My first question is the case (0, 0, 1) which according to the link I posted above, seems to suggest that while that case is actually a conflict, git usually handles it by deleting the line anyway. Can this case actually ever lead to a conflict?

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1). Intuitively, I feel the way these cases are handled might lead to a problem. Let’s say there a was function foo() in W. This function was never actually called in any piece of code. Let’s say in branch A, Alice finally decided to remove foo(). However, in branch B, Bob finally decided on a use for foo() and wrote another function bar() that called foo(). Just intuitively, based on the truth-table, it seems like the merged file will end up deleting the foo() function and adding bar() and Bob would be left wondering why foo() doesn’t work anymore! Which probably leads me to think that the truth-table model I derived for the 3 way merge, is probably not complete and missing something?

D is a merged commit, right? – Nick Volynkin May 23 '15 at 11:48 — Nick Volynkin, May 23 '15 at 11:48

score 6 · Accepted Answer · answered May 23 '15 at 08:16

My first question is the case (0, 0, 1)

Some version control systems like darcs consider that doing the same change (in your case, deletion) in two branches and merging them should lead to a conflict. The typical example is when you have twice

-#define NUMBER_OF_WHATEVER 42
+#define NUMBER_OF_WHATEVER 43

The merge algorithm cannot know for you whether you want the merge to yield 43 (because this is the value both versions agree on) or 44 (because 42 should be incremented twice).

However, considering this case as a conflict causes a lot of spurious conflicts. For example, if one cherry-picks a merge from the master branch to a maintainance branch and later merges the maintainance branch into master, then each line modified by the cherry-pick would lead to a conflict. And the conflict markers would be weird, because they would show the same content on both sides of the conflict marker, like

<<<<<<< HEAD
Hello world
=======
Hello world
>>>>>>> 77976da35a11db4580b80ae27e8d65caf5208086

So, most version-control systems, including Git, chose to consider no conflict when both sides of the merge introduce the same change.

My second question is about deletion cases— (0, 1, 1) and (1, 0, 1).

What you are describing is semantic conflicts. They do exist in theory, and you can even find corner-cases where the merge is compilable but has different semantics compared to the branches being merged. There's no magic, no textual merge algorithm can detect or resolve semantic conflicts. You have to live with them, or work alone.

In practice, they are rare enough. There are probably millions of people using a version control system daily and living with it. The majority probably never thought the problem could exist.

Still, a good organization considerably reduces the risk of semantic conflicts. If you check that your code still compiles after merges, you avoid ~90% of semantic conflicts, and if you have an automatic testsuite, then you'd have to find a semantic conflicts that creates a bug not covered by your testsuite for it to be problematic.

And actually, semantic conflicts are not specific to version-control systems. Another scenario not using merge is

I read the code and see a function f()
My coworker removes function f()
Working on the latest version, which doesn't have f() anymore, I still remember that there's a function f() and I try to use it.

In short, don't be afraid of semantic conflicts.

Hi Matthieu-- thanks for your time and great response. So, summarizing your response to the first question-- (0, 0, 1) can never lead to a conflict when merging two branches, while (1, 1, 0) may or may not lead to a conflict depending on whether the exact same line was added to both sides of the branch(which mostly never happens). In short any syntactic conflicts (as opposed to semantic conflicts which you describe), boil down to the (1, 1, 0) case. Am I right in saying that? — Sandman, May 24 '15 at 03:33
With regards to your answer to the second question, instead of deleting the missing line for cases (1, 0, 1) and (0, 1, 1), why don't version control systems just choose to leave those in. So in essence the output of the truth table for (1, 0, 1) and (0, 1, 1) would be 1 as opposed to 0. Won't this avoid the particular case that I sketched out of calling functions that have been deleted in one branch but not another? — Sandman, May 24 '15 at 03:36
Because with your proposal, you can't delete a piece of code. Delete it, merge, it comes back. No one wants that. — Matthieu Moy, May 24 '15 at 07:55

score 2 · Answer 2 · answered May 23 '15 at 12:18

2

Here's a sample repository for you to test different merge behaviour yourself. It has a plenty of branches with various changes to merge to each other.

Feel free to fork or clone it:

git clone https://github.com/NickVolynkin/GitMergeResearch.git

I will post my results soon.

enter image description here

answered May 23 '15 at 12:18

Nick Volynkin

14,023
6
43
67

Hi Nick-- thanks for taking the time and making the repository. I did work on a local repository too before making the truth table above. Did you find anything that contradicts the table above? – Sandman May 24 '15 at 03:38

Git merge internals

2 Answers2

Linked

Related