4

A colleague had been working in his local repository on master and after a couple of weeks, did a git pull. In the meantime there was a bunch of activity on master including at least 10 merge commits (we have been using a merge workflow so far).

After the pull there were a handful of files that had conflicts. But the unexpected thing was that there were hundreds of modifications in his staging area to files that he did not work on, and each of those modifications was reverting a change on origin/master that was made since the common ancestor commit. For example:

Changes since the common ancestor

His branch (master)

  • A: add 3 lines
  • B: delete 2 lines
  • C: remove 1 line
  • G1..G99: new files

origin/master

  • C: add 2 lines
  • D: delete 10 lines
  • E: change 20 lines
  • ...: many more files changed
  • N1..N3: new files
  • D1..D5: deleted files

After git pull

Modifications in the staging area

  • A: delete 3 lines
  • B: add 2 lines
  • G1..G99: new files
  • All other changes from origin/master are gone. N1-N3 are missing, D1-D5 are back.

In conflict

  • C

I've replicated the problem to verify that he didn't do anything special besides a plain git pull.

Unfortunately he committed all of these changes and I ended up needing to revert that commit. But I want to understand why this would happen so we can avoid it in the future.

I've read some related questions, but none of them answer the question: Why would Git revert the changes from master when there were no changes in the corresponding files on his branch? I understand why there would be modifications in the staging area when there are conflicts after a pull, but I don't understand why they would be effectively reverting the work on origin/master since the common ancestor.

Our approach to the problem right now is

  • Use git pull --rebase instead of merging
  • Check the status after a pull and don't commit if there are unexplainable modifications
Community
  • 1
  • 1
sourcedelica
  • 23,940
  • 7
  • 66
  • 74
  • One thing I should also mention - the `origin/master` side of the merge had a rats nest of branches because people have been mostly working on `master` using a merge workflow. There were branches 7 levels deep at some points. I can see how Git could get confused. – sourcedelica Mar 17 '17 at 20:02

1 Answers1

3

Edit (based on question-edit for more details): it seems possible that this is a case where rename detection has gone wrong, and possibly there's a criss-cross merge in the history as well, so that Git is forming a "virtual merge base".

To find out, first run:

git merge-base --all <hash1> <hash2>

on the two hash IDs that the two branch names master and origin/master pointed-to at the time the mis-merge happened. We need this to test out both whether rename detection is a problem, and whether there are multiple merge bases.

If this prints more than one hash ID, you are getting the recursive strategy to merge the merge-bases, followed by using the resulting merge as a new merge base. This can (in rare cases) produce very confusing results. If so, switching to the -s resolve strategy may help. This will pick one of the two merge bases and stick with it. (But you have no control over which merge base is used.) Note that setting merge.conflictStyle to diff3 will also sometimes show you the effect of a virtual merge base (but only sometimes)

Next, whether or not there are multiple merge bases, we can check to see if rename detection is causing problems. If so, there are two things that may help:

  • The sledgehammer approach: disable rename detection entirely during merge (requires Git version 2.8 or higher): add -X no-renames, which matches the spelling for git diff (though there it's --no-renames).
  • The finer-tune-able tack hammer: raise the limit for detection (the default is 50%): -X rename-threshold=100 or -X find-renames=100 requires an exact match instead of an approximate match. The new spelling, -X find-renames=<n>, matches the spelling for the git diff option and is new in Git version 2.8, but the option itself is very old, having been around since version 1.7.4. Note that other threshold values are allowed as well, although 100% exact match is quite notable.

To find out if Git is detecting renames, we need the merge base, which is a bit of a problem if there are multiple merge bases: we have to merge the merge bases first, to get a real merge commit that Git will use as the new merge base. I'll just assume that this is not the case, since that process is a bit messy; so we'll go on to look at "the" merge base, using:

git diff --name-status --find-renames=50 --diff-filter=R <basehash> <hash1>
git diff --name-status --find-renames=50 --diff-filter=R <basehash> <hash2>

The <hash1> and <hash2> values are the same as before. We tell Git to give us file names and statuses, and then print only the names of files whose status is R (renamed). If Git does think some files are renamed, we will see their old and new names here. How Git combines these during a merge is a bit tricky, but the presence alone of R-status files implies that Git will be doing this sort of thing. If there are no files, then it's not rename-detection after all.

(See this answer for a detailed description of rename detection in git diff. The merge code uses different command line options, some of which have changed relatively recently. See VonC's answer to Disable Git Rename Detection as well.)

Original answer below.


In general, when merging, Git does not choose either "side". Instead, it takes both sides. Remember that there's a third side to this whole three-way merge thing: there's "your" side (HEAD), "their" side (what you're merging), and the base. This forms a triangle:

                o    <-- HEAD
             ...
            o
         ...
...--o--B (base)
         ...
            o
             ...
                o   <-- theirs

and the merge brings them all together to make a shiny (we hope) diamond:

            o
           / \
          o   \
         /     \
...--o--B       o   result
         \     /
          o   /
           \ /
            o

See also this answer and this more technical / detailed answer.

Meanwhile, it turns out that at least some GUI interfaces present this fact to users. They get scared by the idea that there are many changes to many files, when they changed only one file. They instruct their GUI to undo all the other changes—which means throw away the other users' work! They then commit this, and you have to revert their merge to get the other users' work back.

(Another source of "touch every file" is when users enable end-of-line conversions in their setups. They take incoming code that uses LF-only or CRLF endings, and convert it to CRLF, or to LF-only, respectively. Then they commit all these changes, which means they have altered every line of every file.)

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • I've added clarifications to my post. – sourcedelica Mar 17 '17 at 00:35
  • Aha, I wonder if it's rename-detection-gone-wrong now. I'll add that and another link... – torek Mar 17 '17 at 02:42
  • It looks like there is one merge base: `git merge-base --all a6e112cec eeedb3cdb` `76ef56fff60b2cbef3f8c787a8d15c522868a914` – sourcedelica Mar 17 '17 at 19:55
  • `git diff --name-status --find-renames=50 --diff-filter=R 76ef56fff60b2cbef3f8c787a8d15c522868a914 a6e112cec` and `git diff --name-status --find-renames=50 --diff-filter=R 76ef56fff60b2cbef3f8c787a8d15c522868a914 eeedb3cdb` both return no output. I also tried 25/75/90 values for `--find-renames`. – sourcedelica Mar 17 '17 at 19:56
  • Well ... that's fascinating, it's an ordinary merge and there are no renames, and yet the merge goes terribly wrong. At this point I'm out of "off the top of my head" type ideas; I'd have to have a closer look at the repository itself (and any special configurations you might be using). – torek Mar 17 '17 at 20:38
  • Ok - thanks for helping out! I've learned a lot from your answer. – sourcedelica Mar 17 '17 at 21:29
  • BTW it may be interesting to look at the output from both `git diff --name-status` results *without* the `--diff-filter=R`, to see what files *Git* thinks changed from base to each branch-tip. Or even remove the `--name-status` part, to see the changes themselves. – torek Mar 17 '17 at 21:56