Easy auto merge goes wrong

Question

I merged two branches:

Branch master added lines to file config_dev.yml
Branch report didn't touch it (so no conflict for this file).

I merged report into master and the result is: lines has disappeared.

           1  2  3  4  5  6  7  8  9  10
master: ---O--O--O-----O--O-----O--O--O----
report: -------\----O--------O-------/ 
    changes on master__^              ^__merge that silently removes changes

Why? Can I trust git merge?

Additional info that may be related:

There was also a 3rd branch (that didn't touch that file) that goes from commit 3 on master and was kinda "merged" (a blank merge actually, "mine only") into report on commit 7

Well, I don't think I can as I don't know the causes of the problem. — theredled, Jul 20 '19 at 14:58
I know this isn't quite your issue, but this [question](https://stackoverflow.com/q/1407638/10576762) may be helpful to you. — AbsoluteSpace, Jul 20 '19 at 14:59
We don't know the causes either. We can't even see your repository. So ... *shrug*? — melpomene, Jul 20 '19 at 15:10
Since merge looks only at the two final commit states being merged and their merge base, I find a three-way diff between those three commits often answers such questions. Here you want to diff 9, the head of report before 10, and their merge-base, 2. Look at `git diff 2 10` and `git diff 2 `. Or, since Git doesn't provide that three-way diff directly, these might help: https://stackoverflow.com/a/55831128/3216427 for a one-file three-way diff, or https://stackoverflow.com/a/56917121/3216427 for a complete three-way diff. — joanis, Jul 20 '19 at 17:08
Did you run the merge yourself? Does it reproduce if you run it again? Does it reproduce if you remove all other files from the history with `git filter-branch`? — max630, Jul 21 '19 at 04:35
Try to repeat the merge with `-Xno-renames` and see if it still happens — max630, Jul 21 '19 at 04:38

score 2 · Answer 1 · answered Jul 20 '19 at 19:14

This isn't really an answer (and should be a comment) but it needs formatting and won't fit into comment space. Instead, this is instructions on how to find the answer, or at least come up with the right inputs that will lead to the answer.

When you run:

git checkout master
git merge report

Git first finds a merge base commit. Rarely—probably not the case here—Git finds more than one merge base commit, but we'll make sure that isn't the case.

Let's say this merge is going to go wrong, but—importantly—hasn't been done yet. If it has been done already, we need a trick, or a separate repository in which it hasn't been done: either will suffice. The trick is to create a new branch, not named master, that points to the commit that master would have pointed-to before the merge:

git checkout -b test-the-merge <hash-ID>

(We can throw away this branch later. Or, we can even not create it at all, using another trick, but for simplicity it's easier to use the test branch.)

Now that we are in the state where the merge hasn't been done yet, we first run:

git merge-base --all HEAD report

Ideally, this produces one hash ID. If it produces more than one hash ID, we have that rare situation where there is more than one merge base, and we need a process that I'm going to omit because it's long and mostly boring. :-)

Now that we know the one hash ID that is the merge base, here's how we see what Git will see, when Git does the merge:

git diff --find-renames <hash> HEAD     # what does Git think *we* changed?
git diff --find-renames <hash> report   # what does Git think *they* changed?

You might want to send these two git diffs to files, so that you can peruse them at leisure and/or in parallel. You can also restrict the output to particular files, if there are a lot of diffs that merge correctly and you just want to focus on the particular files that don't merge correctly.

Note that what git diff finds is not necessarily the change that some person made. What git diff finds is a minimal set of instructions that produce the same effect. This is usually good enough. For instance, suppose that between the left side commit (the merge base, specified by hash ID) and the right side commit, someone deleted the second line, the first word the, of a redundant passage:

Paris in
the
the
spring

and Git chooses to instead produce the instructions: delete the second the (the third line).

Does it matter? Probably not—but sometimes, it does. What if the left and right sides read:

some vaguely C like code {
    with redundant stuff
}
more vaguely C like code {
    with redundant stuff
}

and:

some vaguely C like code {
    with redundant stuff
}
still vaguely C like code {
    with redundant stuff
}
more vaguely C like code {
    with redundant stuff
}

and Git synchronizes, mistakenly, on close-braces and produces a syntactically-invalid diff? Well, even that usually still works, but sometimes it produces inappropriate conflicts. In extremely rare cases (which nonetheless do occur but are very hard to illustrate), you may miss a conflict that should have occurred, due to Git coming up with a minimal edit that is syntactically correct but semantically wrong.

In any case, what Git now does, having produced the two diff listings, is to come up with the merged set of source files. To do that, Git starts with the merge base version of all the files. Then:

For each file that you, they, or both changed...
- Did you change the file? Did they not change the file? If so, take your file.
- Did they change the file, and you not change it? If so, take their file.
- Otherwise, you both changed the file. Attempt to combine the changes, going line by line through the diffs. Wherever you added some lines, take your added lines. Wherever they added some, take their added lines. Wherever you or they deleted some lines, take the deletion. If you both made the exact same change to the exact same lines, take one copy of the change. If you made different changes to the same lines, or to lines that abut (touch each other) or hit the end of file, declare a merge conflict.
For files you or they deleted entirely, or created as totally-new, or renamed, do the appropriate thing. (What exactly appropriate means gets complicated and does not appear to be your case here, so I won't go into it.)

Having combined everything to the best of its ability, Git now:

stops with the merge conflict, or
stops because you told it to (--no-commit, for instance), or
makes a merge commit because everything seems to have gone swimmingly.

If Git makes the merge commit—or if it stops and lets you make it and you do make it—the merge commit has your current branch's old tip commit as its first parent—the new tip commit is the merge commit itself—and the merge commit has the other branch's tip commit as its second parent. That is, after this merge on master or on test-the-merge (whichever branch we're on), we have:

git rev-parse <branch>^1   => hash ID of the previous branch tip
git rev-parse <branch>^2   => same hash ID as git rev-parse report

You can look at the two inputs, from the two diffs made from the three commits—merge base, HEAD, and theirs / tip-of-report—to see what Git sees, which will explain why Git made the merge that it did.

In any case, this is what git merge does: it finds a merge base, makes two diffs, and then combines the diffs and applies the combined diffs to the merge base to come up with the merge result. (This of course ignores all the special cases, such as not-actually-merging for various reasons: -s ours, or fast-forward, or unrelated histories, or merge conflicts, etc.) The diffs are inherently line-oriented and cannot reconstruct what someone really did: they only produce a set of instructions that will come up with the same final text. Combining such instructions tends to work with most programming languages, but is definitely not perfect.

If that process works with your files, you can trust it. If that process doesn't work with your files, you cannot trust it, and you must carefully inspect—and correct if needed—each merge.

Easy auto merge goes wrong

1 Answers1