33

I've created a simple git repo to illustrate my question, available on GitHub here: https://github.com/smileyborg/EvilMerge

Here's an illustration of the repo history:

master          A---B---D---E-----G-----I
                 \     /     \         /
another_branch    ----C       \       /
                               \     /
another_branch2                 F---H

(In the actual repo on GitHub, D is 4a48c9, and I is 48349d.)

D is a "simple" evil merge, where the merge commit "correctly" resolves a merge conflict, but also makes an unrelated "evil" change that did not exist in either parent. It is possible to discover the "evil" part of this merge by using git show -c on this commit, as the output includes ++ and -- (as opposed to single + and -) to indicate the changes that did not exist in either parent (see this answer for context).

I is a different kind of evil merge, where the merge commit "correctly" resolves a merge conflict (caused by changes from F to file.txt that conflict with changes from G), but also "evilly" discards the changes made to a completely different file file2.txt (effectively undoing the changes from H).

How can you know that I is an evil merge? In other words, what command(s) can you use to discover that I not only manually resolves a conflict, but also fails to merge changes that it should have?

Edit/Update: What is an evil merge?

As pointed out by René Link below, it is hard (perhaps impossible) to define a generic set of criteria to identify an "evil merge". However, much like Supreme Court Justice Stewart said about pornography, evil merges are something you know when you see.

So perhaps a better question to ask is this: what git command(s) can you use on a merge commit to get a diff output of all novel changes introduced solely in the merge commit itself. This diff should include:

  • all merge conflict resolutions (at least, if the resolution involved anything more complex than choosing one parent's changes over the other's)
  • all additions or removals that did not exist in either parent (as seen in D)
  • all changes that did exist in one of the parents but that the merge commit discards (as seen in I)

The goal here is to be able to have a human look at this output and know whether the merge was successful or (accidentally or maliciously) "evil" without having to re-review all the previously-reviewed changes (e.g. F and H) that are being integrated in the merge.

Community
  • 1
  • 1
smileyborg
  • 30,197
  • 11
  • 60
  • 73
  • [This question](http://stackoverflow.com/questions/27183534/how-to-verify-a-git-merge-contains-no-extra-changes) is related, but I wasn't able to use the suggested diff commands to identify the evil portion of `I`. – smileyborg Dec 29 '14 at 03:37
  • This isn't directly related to evil merges and how to detect them, but please read the top of [this answer of mine](http://stackoverflow.com/questions/25488138/move-initial-commits-off-master-to-another-branch-in-git/25490288#25490288) anyway. – jub0bs Dec 29 '14 at 10:40
  • Thomas Rast wrote [`evilmergediff`](https://github.com/trast/evilmergediff), a Python tool for detecting evil merges; I haven't tested it, though. – jub0bs Dec 29 '14 at 10:49
  • @Jubobs I am quite familiar with what a branch is in Git. The reason I chose to illustrate the history as above (including preserving the chronological order of commits through their alphabetical lettering) is to make it much clearer what has happened to get into the interesting state. And since a branch is simply a pointer to a commit, this simple visualization is in fact quite accurate -- every commit reachable from the rightmost commit in each row represents the history of that row's branch: `I` is the tip of master, `C` the tip of another_branch, and `H` the tip of another_branch2. – smileyborg Dec 29 '14 at 15:52
  • @Jubobs Anyways, `evilmergediff` looks interesting. I wish there was more documentation on the heuristics and strategies it is using though. I will take a closer look for sure. – smileyborg Dec 29 '14 at 16:17
  • I'm not sure this is possible, or even sensible. If you want to avoid conflicts, force everybody to rebase all of their changes all the time, and only use fast-forward merges. Otherwise, the merge of two features with any amount of overlap is going to introduce "novel" changes in the merge commit. – user229044 Dec 31 '14 at 03:48
  • @meagar What do you think is impossible? My goal is not to avoid conflicts or the introduction of novel changes in a merge commit. As explained in the update to the above question, the goal is to get a meaningful diff of the changes actually introduced in a merge commit. Diff'ing a merge commit to either of its two parents results in a very noisy output. I am almost certain that it is possible to use some specific commands to strip away the noise and leave simply the changes that were introduced in the merge commit itself. Maybe it requires using Git's plumbing, but it should be possible. – smileyborg Dec 31 '14 at 04:45

7 Answers7

8

The easiest thing to do would be to diff the results of your conflict resolution with a merge that auto-resolves conflicts without human intervention. Any automatic resolutions will be ignored, since they will be resolved in exactly the same way.

I see two ways of visualizing the possible "evil" resolutions. If you are making this into a script add &> /dev/null to the end of all lines that you do not care to see output.

1) Use two separate diffs, one that favors the first parent, and a second that favors the second parent.

MERGE_COMMIT=<Merge Commit>
git checkout $MERGE_COMMIT~
git merge --no-ff --no-edit -s recursive -Xours $MERGE_COMMIT^2
echo "Favor ours"
git diff HEAD..$MERGE_COMMIT
git checkout $MERGE_COMMIT~
git merge --no-ff --no-edit -s recursive -Xtheirs $MERGE_COMMIT^2
echo "Favor theirs"
git diff HEAD..$MERGE_COMMIT

2) Diff against the results of the conflicted merge with the conflicts still in.

MERGE_COMMIT=<Merge Commit>
git checkout $MERGE_COMMIT~
git -c merge.conflictstyle=diff3 merge --no-ff $MERGE_COMMIT^2 --no-commit
git add $(git status -s | cut -c 3-)
git commit --no-edit
git diff HEAD..$MERGE_COMMIT
smileyborg
  • 30,197
  • 11
  • 60
  • 73
Joseph K. Strauss
  • 4,683
  • 1
  • 23
  • 40
  • It sounds like your strategy is on the right track. However, I tried running both of your suggestions (#1 and #2) using the merge commit `I` in the sample repository linked in the above question, and the `git merge` commands all result in `Already up to date` and the `git diff` output is completely empty. Please try executing your suggestions using the sample repo, for commits `I` and `D`, and tell me if you are seeing something different. If you have any revised suggestions, please test using that repo to make sure they provide the desired output for the two merge commits. – smileyborg Jan 02 '15 at 23:35
  • @smileyborg You were right. I was trying to checkout the merge commit instead of checking out the merge commit's main parent. I updated the answer to reflect the correct way. – Joseph K. Strauss Jan 04 '15 at 00:59
  • Awesome! This is the simplest and cleanest method so far. Using this as a starting point, I've created two scripts, one for each of your suggested methods. Will post those as a top-level answer. – smileyborg Jan 06 '15 at 18:21
  • Here's the complete answer, with links to the scripts: http://stackoverflow.com/a/27804868/796419 – smileyborg Feb 21 '15 at 19:03
6

Disclaimer: As pointed out by @smileyborg, this solution will not detect a case where the evil merge completely reverted a change that was introduced by one of the parents. This defect occurs because according to the Git Docs for the -c option

Furthermore, it lists only files which were modified from all parents.

I recently discovered a much simpler solution to this question than any of the current answers.

Basically, the default behavior of git show for merge commits should solve your problem. In cases where the modifications from both sides of the merge do not touch, and no "evil" changes were made there will be no diff output. I had previously thought that git show never shows diffs for merge commits. However, if a merge commit involves a messy conflict or an evil merge, then a diff will be displayed in combined format.

To view the combined format when viewing a number of commit patches with log -p, simply add the parameter --cc.

In the example given from GitHub in the question the following is displayed (with my comments interspersed):

$ git show 4a48c9

(D in the example)

commit 4a48c9d0bbb4da5fb30e1d24ae4e0a4934eabb8d
Merge: 0fbc6bb 086c3e8
Author: Tyler Fox <Tyler_Fox@intuit.com>
Date:   Sun Dec 28 18:46:08 2014 -0800

    Merge branch 'another_branch'

    Conflicts:
        file.txt

diff --cc file.txt
index 8be441d,f700ccd..fe5c38a
--- a/file.txt
+++ b/file.txt
@@@ -1,9 -1,7 +1,9 @@@
  This is a file in a git repo used to demonstrate an 'evil merge'.

The following lines are not evil. Changes made by the first parent are indicated by a +/- in the left-most column; changes made by the second parent are indicated by +/- in the second column.

- int a = 0;
- int b = 1;
+ int a = 1;
+ int b = 0;
 +int c = 2;
- a = b;
+ b = a;
  a++;

Here is the evil part: ++ was change to -- from both parents. Note the leading -- and ++ indicating that these changes occur from both parents, meaning that someone introduced new changes in this commit that were not already reflected in one of the parents. Do not confuse the leading, diff-generated ++/-- with the trailing ++/-- which is part of the file contents.

--b++;
++b-- ;

End of of evilness.

 +c++;

To quickly view all merge commits that may have issues:

git log --oneline --min-parents=2 --cc -p --unified=0

All uninteresting merges will be displayed on a single line, while the messy ones - evil or otherwise - will display the combined diff.

Explanation:

  • -p - Display patch
  • --oneline - Display each commit header on a single line
  • --min-parents=2 - Only show merges.
  • --cc - Show combined diff, but only for places where changes from both parents overlap
  • --unified=0 - Display 0 lines of context; Modify the number to be more aggressive in finding evil merges.

Alternatively, add the following to eliminate all uninteresting commits:

-z --color=always | perl -pe 's/^([^\0]*\0\0)*([^\0]*\0\0)(.*)$/\n$2\n$3/'
  • -z - Display NUL instead of newline at the end of commit logs
  • --color=always - Don't turn off color when piping to perl
  • perl -pe 's/^([^\0]*\0\0)*([^\0]*\0\0) - Massage the output to hide log entries with empty diffs
Joseph K. Strauss
  • 4,683
  • 1
  • 23
  • 40
  • 1
    Interesting! +1. The `++` mentioned in http://stackoverflow.com/q/29774466/6309 (and asked about in http://stackoverflow.com/q/29976403/6309) is well used here. – VonC May 04 '15 at 16:48
  • Joseph, I recall having tried this approach before asking this question. From what I remember, this technique does **not** correctly identify commit **`I`** as an evil merge where the merge commit makes some evil changes to an unrelated file (which is a different type of situation than **`D`**). Your original answer definitely does catch both cases correctly though. Give it a try and see if you find the same thing I did. – smileyborg May 04 '15 at 17:32
  • @smileyborg Just tried it. I added a new file after doing a clean merge, and then amended the merge commit. It shows the diff exactly as you would want it. – Joseph K. Strauss May 04 '15 at 17:49
  • @JosephK.Strauss Try doing `git show 48349d` on the example repo in the question. Note that the output is only of `file.txt`, when the evil part of that merge (**`I`**) is that is discards changes in `file2.txt` that are unrelated to the actual merge conflict in `file.txt`. So, this is an evil merge that `git show` fails to identify. (Read the commit message of `48349d` to understand in more detail how that merge was evil.) – smileyborg May 04 '15 at 17:57
  • @smileyborg Interesting. I guess the combined diff only shows changes where the combined content of a file differs from both sides. I also tried modifying file2.txt at the top of the file and amending. I got the only the new results when using `--cc`, and I got all the changes when using `-c`. Additionally, switching the order of the parents did not change anything, so Git is not making a difference whether the change is in the mainline or not. – Joseph K. Strauss May 04 '15 at 18:43
  • @JosephK.Strauss Yes, all good observations. But the takeaway is that while `git show` can be helpful, it can miss some important evil merges, so you should use [one of these scripts](http://stackoverflow.com/a/27804868/796419) that leverage the technique from your [original answer](http://stackoverflow.com/a/27744011/796419) to detect any evilness that may have occurred. – smileyborg May 04 '15 at 19:51
  • @smileyborg Next time: clearly indicate in the question that you are unsatisfied with git show, and why. I still believe that for many git users, the convenience of git show (or log -p) is something that cannot be ignored. It sure makes viewing the conflict resolution easier to see when the file was definitely changed. – Joseph K. Strauss May 04 '15 at 20:35
  • @JosephK.Strauss The original question does state that `git show` successfully identifies the situation in **`D`**...I'm sorry if the question was not perfectly clear to you, but it did specifically ask how to detect the situation in **`I`** (where `git show` fails). I don't disagree that `git show` is useful and convenient, but it is important to make clear that it may not detect all evilness -- which was the actual question I asked. Anyways, your answer which received the bounty is still the best so far. – smileyborg May 04 '15 at 23:42
5

Before we can detect evil merges we must define what evil merges are.

Every merge that has conflicts must be resolved manually. In order to resolve conflicts we can

  1. take one of the changes and omit the other.
  2. eventually take both changes (in this case the order in the result might be important)
  3. take none of them and create a new change that is the consolidation of both.
  4. take none of them and omit both.

So what is an evil merge now?

According to this blog it is

a merge is considered evil if it does not faithfully integrate all changes from all parents.

So what is a "faithful integration"? I think noone can give a general answer, because it depends on the semantics of the code or text or whatever is merged.

Other say

An evil merge is a merge that introduces changes that do not appear in any parent.

With this definition all conflicts that are resolved by

  1. take one of the changes and omit the other.
  2. take none of them and create a new change that is the consolidation of both.
  3. take none of them and omit both.

are evil merges.

So we finally come to the questions.

Is it legal to

  • only take one of the changes and omit the other?
  • take both changes?
  • take none of them and create a new change that is the consolidation of both?
  • take none of them and omit both?

And things can become more complex if we think about octopus merges.

My conclusion

The only evil merge we can detect is a merge that was done without conflicts. In this case we can redo the merge and compare it with the merge that was already done. If there are differences than someone introduced more than he/she should and we can be sure that this merge is an evil merge.

At least I think we must detect evil merges manually, because it depends on the semantics of the changes and I'm not able to formulate a formal definition of what an evil merge is.

Community
  • 1
  • 1
René Link
  • 48,224
  • 13
  • 108
  • 140
  • I will agree that it may be difficult or impossible to define an "evil merge". I've updated my question to address this -- in this case, I'm assuming a human can tell the difference if he/she is able to see the right diff. So my question is really how do you generate a diff that only shows the "interesting" parts of a merge. – smileyborg Dec 29 '14 at 16:12
4

I've expanded on the answer from Joseph K. Strauss to create two complete shell scripts that can be easily used to get a meaningful diff output for a given merge commit.

The scripts are available in this GitHub Gist: https://gist.github.com/smileyborg/913fe3221edfad996f06

The first script, detect_evil_merge.sh, uses the strategy of automatically redoing the merge again without resolving any conflicts, and then diff'ing that to the actual merge.

The second script, detect_evil_merge2.sh, uses the strategy of automatically redoing the merge again twice, once resolving conflicts with the first parent's version, and second resolving conflicts using the second parent's version, and then diff'ing each of those to the actual merge.

Either script will do the job, it's just personal preference on which way you find it easier to understand how the conflicts were resolved.

Community
  • 1
  • 1
smileyborg
  • 30,197
  • 11
  • 60
  • 73
3

Preliminary note: I'm using Linus Torvalds's definition of an "Evil Merge" here, which as Junio Hamano notes can sometimes be a good thing (e.g., to resolve semantic conflicts rather than textual conflicts). Here is Linus's definition:

an "evil merge" is something that makes changes that came from neither side and aren't actually resolving a conflict [Source: LKML]

As @joseph-k-strauss noted in his answer, the problem with any evil-merge detection based solely on "-c" or "--cc" is this:

"Furthermore, it lists only files which were modified from all parents." [Source: man git-log]

And so to detect I's evil-ness, we need to find files modified by some, but not all, of its parents.

I believe clean merges have a symmetric property. Consider this diagram:

enter image description here

In a clean merge the diagonals are the same: b1 == m2 and b2 == m1. The sets of changed lines only have overlap when conflicts occur, and clean merges have no conflicts. And so the set of changes in b2 must match m1, since the whole point of b2 is to replay m1 on top of parent2, to bring parent2 in sync with parent1 (and remember --- there were no conflicts). And vice versa for m2 and b1.

Another way of thinking about this symmetry: when we rebase, we essentially throw away b1 and replace it with m2 instead.

And so if you want to detect evil merges you can use "git show -c" for files changed by both parents, and otherwise check that the symmetry holds for the four segements of the diagram using "git diff --name-only".

If we assume merge from the diagram is HEAD (e.g., let's see if the merge I just committed is evil), and we use the fancy "three dot" git diff notation which computes merge-base for you, I think you only need these four lines:

git diff --name-only HEAD^2...HEAD^1 > m1
git diff --name-only HEAD^1...HEAD^2 > b1
git diff --name-only HEAD^1..HEAD    > m2
git diff --name-only HEAD^2..HEAD    > b2

Then analyze the contents to see that m1 == b2 and b1 == m2. If they don't match, then you've got evil!

Any output from either of these indicate evil, since if we cat b1 and m2 and sort them, then every line should occur twice.

cat b1 m2 | sort | uniq -c | grep -v ' 2 '
cat b2 m1 | sort | uniq -c | grep -v ' 2 '

And for the EvilMerge example, commit I outputs the following:

cat b2 m1 | sort | uniq -c | grep -v ' 2 '
      1 file2.txt

The edit to "file2.txt" only occurred once between the b2 and m1 diagonals. The merge is not symmetric, therefore it's not a clean merge. EVIL SUCCESSFULLY DETECTED!

Community
  • 1
  • 1
G. Sylvie Davies
  • 5,049
  • 3
  • 21
  • 30
  • 3
    Interesting use of git diff here. I'll have to test it, but from what I have read here, +1. – VonC Dec 28 '16 at 10:53
1

Simplest is probably best here: diff the results of a throwaway uncorrected (and incomplete) automerge, with no conflicts resolved if there are any, with the actual merge results.

Ordinary ours/theirs resolutions will show up as all 3 (4 for a 3diff) conflict marker lines deleted,and one side or another of the change hunks also deleted, that'll be easy to eyeball.

Any alterations of either branch's changes will show up as an odd-looking mix, for instance any gratuitously added or removed hunks will show up outside conflict markers.

In the example repo, after

git clone https://github.com/smileyborg/EvilMerge

git checkout master^
git merge --no-commit master^2   # --no-commit so  w/ or w/o conflict work the same

running the suggested diff gets

$ git diff -R master    # -R so anything master adds shows up as an add
diff --git b/file.txt a/file.txt
index 3835aac..9851407 100644
--- b/file.txt
+++ a/file.txt
@@ -1,12 +1,6 @@
 This is a file in a git repo used to demonstrate an 'evil merge'.

-<<<<<<< HEAD
-int a = 3;
-||||||| merged common ancestors
-int a = 1;
-=======
-int d = 1;
->>>>>>> master^2
+int d = 3;
 int b = 0;
 int c = 2;
 b = a;
diff --git b/file2.txt a/file2.txt
index d187a25..538e79f 100644
--- b/file2.txt
+++ a/file2.txt
@@ -4,6 +4,6 @@ int x = 0;
 int y = 1;
 int z = 2;
 x = y;
-x--;
-y--;
-z--;
+x++;
+y++;
+z++;

and it's instantly clear something's fishy: in file.txt the changes on both branches were discarded and a line from nowhere inserted., while in file2.txt there never was a conflict, and the merge just gratuitously changes the code. A little digging shows it's a commit reversion here, but that's immaterial, the point is the usual changes follow readily-recognizable patterns and anything unusual is easily detectable and worth checking.

Similarly, after

git branch -f wip 4a48
git checkout wip^
git merge --no-commit wip^2

running the suggested diff gets

$ git diff -R wip
diff --git b/file.txt a/file.txt
index 3e0e047..fe5c38a 100644
--- b/file.txt
+++ a/file.txt
@@ -1,19 +1,9 @@
 This is a file in a git repo used to demonstrate an 'evil merge'.

-<<<<<<< HEAD
-int a = 0;
-int b = 1;
-int c = 2;
-a = b;
-||||||| merged common ancestors
-int a = 0;
-int b = 1;
-a = b;
-=======
 int a = 1;
 int b = 0;
+int c = 2;
 b = a;
->>>>>>> wip^2
 a++;
-b++;
+b--;
 c++;

and again the oddity jumps out: wip added a int c = 2 to the wip^2 branch's changes, and it toggled b-- to b++ out of nowhere.

You could from here get cute and automate some of the predictable stuff to make bulk vetting go faster, but that's really a separate question.

jthill
  • 55,082
  • 5
  • 77
  • 137
0

What about redoing the merge 'virtually' and comparing the result? In other words

pseudo code:

  1. starting by I
  2. get the 2 parents: G, H
  3. git checkout E
  4. git merge H
  5. now you have new-I.
  6. compare I and new-I, either by using git diff or by comparing the output of git show I and git show new-I

Especially the last step will be hard work, if you want to do it completely automatically, at least if there were conflicts in the commit

Daniel Alder
  • 5,031
  • 2
  • 45
  • 55
  • This would technically work, but in order for Step 6 to yield the desired result, you'd need to do a full-effort Step 4. In other words, you'd have to re-do the entire merge effort, including conflict resolution, to then see if it turned out the same as the original. That's a lot of tedious work, and I'm pretty certain you can get the desired diff output without having to redo the entire merge. – smileyborg Jan 06 '15 at 03:08
  • I don't think so. Conflicts are conflicts because they cannot be resolved automatically... – Daniel Alder Jan 06 '15 at 09:24
  • that's right, but the goal isn't to have Git or some diff command tell me if the conflicts were resolved correctly, the goal is to just see what the changes made during the merge are. Then a human can interpret what happened. This shouldn't require redoing the merge. – smileyborg Jan 06 '15 at 15:26