7

In my text files, I often move large sections around. In other words, I take a section that's anywhere from 3 to 50 lines long, cut it, and paste it unchanged somewhere else in the file.

Under "uncommitted changes," Git (I use Github OSX) displays those lines as being "deleted" in one part of the file and "added" in another.

Given my workflow, it would be much more helpful if Git's diff display did not highlight for me sections that I've merely moved from one place to another. Instead, I want Git to highlight for me only lines that are totally new, and lines that I've deleted completely from the file. (As well as lines I've changed some part of.)

Is there a way to instruct Git's diff display to ignore "deleted" sections of 3+ lines if it finds identical "added" sections elsewhere in the file?

Currently I use wdiff = diff-highlight.

UPDATE: It looks like specifying an external git diff is straightforward:

gitconfig
[diff]
    external = ~/prose-diffs.py 

Does anyone have an external git diff that compares "added" sections to "deleted" sections (ignoring line breaks at the beginning and the end), and automatically hides any sections where the added lines match the deleted lines?

incandescentman
  • 6,168
  • 3
  • 46
  • 86
  • 2
    Not with the built-in diff, but you can have git use an "external diff". That still leaves the problem of finding (or writing) such a diff (which is why this is a comment, not an answer). – torek Nov 08 '15 at 10:01
  • 1
    One trick I use in such situation is to somehow make text files "canonical" before comparing. Usually, I sort them and compare sorted files. In some cases removing indents or some punctuation is needed. – Roman Susi Nov 08 '15 at 10:20
  • @torek How hard would it be for a programmer to write such a diff? Once written, could GUI clients like Github Desktop OSX or Sourcetree be able to use the external diff? – incandescentman Nov 12 '15 at 20:54
  • 1
    "How hard" depends how good a job you want: the string-to-string minimal edit distance problem rapidly grows in complexity if you add "move" operations (search for "edit distance"), but a cheesy hack looking for matching add/delete sections using any existing diff would be easy. (GUI clients) I haven't used them, and don't know. – torek Nov 12 '15 at 20:59
  • Yes, the latter—looking for `deleted` sections that match `added` sections, and hiding any sections that match—would be totally sufficient. – incandescentman Nov 12 '15 at 21:15
  • 2
    Read this post for more information: http://stackoverflow.com/questions/12590947/using-git-diff-to-detect-code-movement-how-to-use-diff-options – pjbrito Nov 20 '15 at 21:45

1 Answers1

2

An outline of the steps

  1. A diff program - one that understands line reordering
  2. Using the new diff program - set your git config

A diff program:

So it turns out that git will call your diff program with the following arguments:

> my_diff_tool <filename> <old_location> <old_hash> <old_mode> <new_location> <new_hash> <new_mode>

So here is the simplest possible diff tool that does something close to what you want. It reads the files and then prints out the lines that are new and the lines that are old using set comparisons.

#!/usr/bin/python
import sys

old = open(sys.argv[2]).read().splitlines()
new = open(sys.argv[5]).read().splitlines()

print "-"* 80
print "Filename: %s" % sys.argv[1]

# Simple set method
for line in set(old) - set(new):
    print '- %s' % line
for line in set(new) - set(old):
    print '+ %s' % line
if set(new) == set(old):
    print "Text reordering only"

Here is an example of what this outputs vs what diff would output:

my_diff_tool

Filename: test.txt
- luctus pellentesque.
+ luctus pellentesque. Puric huma te.

Diff

diff --git a/test.txt b/test.txt
index 2ec8f4b..797e2ad 100644
--- a/test.txt
+++ b/test.txt
@@ -4,15 +4,15 @@ dolor quis feugiat. Nullam vel interdum leo, a placerat elit. Mauris quis
 faucibus dui.

 Nullam eu sapien quis ex convallis tincidunt. Mauris tristique mauris ac
-luctus pellentesque.
+luctus pellentesque. Puric huma te.

 Duis at imperdiet lacus. Sed malesuada elit vitae arcu semper, at fringilla
 purus rhoncus. Sed vestibulum pellentesque libero in interdum. Fusce elementum
 ornare vulputate.

+Nam sed enim at nisi tincidunt accumsan eu nec nisl. Duis suscipit hendrerit
+fermentum. Sed mattis purus congue velit aliquet, non placerat lectus varius.
+
 Donec placerat, purus ac aliquet ullamcorper, elit leo accumsan ante, sed
 lacinia leo sem sed metus.  Morbi lacinia porttitor ante, eget consequat
 libero accumsan in. Nunc sit amet lectus magna.
-
-Nam sed enim at nisi tincidunt accumsan eu nec nisl. Duis suscipit hendrerit
-fermentum. Sed mattis purus congue velit aliquet, non placerat lectus varius.

Obviously, there is much to be improved. For instance, sets will ignore duplicate lines. Sets also reorder so it's harder to understand if there are lots of new lines.

These improvements are left as an exercise to the reader.

Using the new diff program

Setting your git config to use the new tool is easy. You can also modify your .gitconfig as you showed above.

> git config diff.external /location/to/your/diff/tool/my_diff_tool

You'll want to make sure that your diff tool is executable or you will get an error (fatal: cannot exec '/location/to/your/diff/tool/my_diff_tool': Permission denied) so:

> chmod +x /location/to/your/diff/tool/my_diff_tool
Liyan Chang
  • 7,721
  • 3
  • 39
  • 59