0

I am diffing files that contain code blocks that have similar contents. The issue is this can cause diffs to get confused. I will start with an example, because this is hard to explain with words.

file1.txt:

text
(
    contents
)
block "block1"
(
    contents
)
block "block2"
(
    contents
)

file2.txt:

block "block1"
(
    contents
)
text
(
    contents
)
block "block2"
(
    contents
)

When I diff these two files i get the following output:

-text
+block "block1"
 (
     contents
 )
-block "block1"
+text
 (
     contents
 )
 block "block2"
 (
     contents
 )

The issue is, the diff program doesn't recognize that code blocks of type "block" are entirely independent from code blocks of type "text" and should be treated as separate entities. (Perl's Text::Diff in this case, but I also have git-diff available and it does the same thing.)

How can I make a diff recognize these different types of code blocks as separate entities so a diff of these two files would produce the following results instead?

-text
-(
-    contents
-)
 block "block1"
 (
     contents
 )
+text
+(
+    contents
+)
 block "block2"
 (
     contents
 )

Note that this is a drastically simplified example compared to the code I am actually trying to diff, I understand that it is easy enough to figure out what this example is doing, but when you are dealing with hundreds of similar elements the diff output becomes completely unreadable.

I want the diff to realize that only a "text" code block was edited in this modification and no "block" code blocks were touched.

tjwrona1992
  • 8,614
  • 8
  • 35
  • 98

2 Answers2

1

If you're ok with using git directly, try git diff --patience

$ git diff --patience
diff --git a/foo1.txt b/foo1.txt
index b474449..30a91bb 100644
--- a/foo1.txt
+++ b/foo1.txt
@@ -1,11 +1,11 @@
-text
-(
-    contents
-)
 block "block1"
 (
     contents
 )
+text
+(
+    contents
+)
 block "block2"
 (
     contents
oalders
  • 5,239
  • 2
  • 23
  • 34
  • I am okay with using git diff, but it seems to require the text to be stored in files. Is there a way to perform a git diff on two strings without writing them to files? – tjwrona1992 Oct 03 '17 at 15:30
  • I suggested this because you mentioned "I also have git-diff available". You could use temp files, depending on how performant you need this to be. You could look into ramdisk as a means of speeding up the disk accesses. – oalders Oct 03 '17 at 20:46
  • I guess I can just use temp files, but it does take a significant amount more time and make my code look rather cludgy. – tjwrona1992 Oct 03 '17 at 21:01
  • Have you tried it with ramdisk? I'd be interested to know how much slower that would be. It's more code to write for sure, but using a combination of `File::Temp` and `Git::Sub` shouldn't be that bad. – oalders Oct 04 '17 at 00:55
  • Unfortunately I can't easily try it with an external program like ramdisk because where I work they have very strict security policies and I would have to go through a security review process to request software like ramdisk that is currently not used inside the company :( – tjwrona1992 Oct 04 '17 at 12:49
  • You should be able to set up a ramdisk on a Linux machine without installing any additional software: http://www.techrepublic.com/article/how-to-use-a-ramdisk-on-linux/ You would need `sudo`, though. – oalders Oct 04 '17 at 13:54
  • Unfortunately I do not have `sudo` on our Linux systems, but this script will be primarily used on Windows anyways. – tjwrona1992 Oct 04 '17 at 14:32
1

The best way to handle this is to use manual diff alignments in KDiff3. Reusing some example images, but it would be just the same result with your example files:

Comparision of old and new function_A with sync1 Comparision of old and new function_A with sync2

hlovdal
  • 26,565
  • 10
  • 94
  • 165
  • Good answer, but I'm trying to do this from the command line as part of a script so manual alignments aren't an option for me :( – tjwrona1992 Oct 03 '17 at 21:02