Diff tool filter

Question

How can I diff two files but ignore all differences between comment strings. I would like to see the comments in the resulting diff, but not have the tool consider differences between comments to be real differences.

File1.py

# File 1 code
print(“code”)
print(“same code”)
print(“code”) # comment 1

File2.py

# File 2 code
print(“different code”)
print(“same code”)
print(“code”) # comment 2

When I diff file1.py and file2.py I want to be able to ignore comments, but still print them in the diff. Perhaps some command like:

diff -y file1.py file2.py -- magicRegex “#.*”

The desired output might look like:

#File 1 code                  # File 2 code
print(“code”)              |  print(“different code”)
print(“same code”)            print(“same code”)
print(“code”) # comment 1     print(“code”) # comment 2

There are several advices about i: https://stackoverflow.com/questions/7504059/how-can-i-perform-a-diff-that-ignores-all-comments — angry buddha, Apr 14 '21 at 04:26
@Lenna: It does not make sense to simply remove any `#.*`, because if, for instance, you have a line such as `prog "#" x y `, you certainly don't want to have anything removed, because in this case `#` does not denote a comment. Your regex would not handle this situation and simply purge everything after the `#`. — user1934428, Apr 14 '21 at 06:41
Are you open to other tools like Meld or does this have to be `diff` + other bash tools? I think doing this using `diff` + `grep` + `sed` might be possible, but difficult. I especially think it would be hard to insert the comments back into `diff -y` output using bash tools while keeping things pretty. — xdhmoore, Apr 14 '21 at 13:10
Totally open to other tools. Will change the question title to reflect that. — Lenna, Apr 14 '21 at 15:00

xdhmoore · Answer 1 · 2021-04-19T18:27:27.157

I was thinking more about this today. Ideally, there's a tool out there to do this, but if not, I think this might work, depending on how much it is worth to you to script it:

Comment-preserving diff algorithm:

1 . For file1 and file2, process them and create 2 new files for each:

  i.  A version of each file with the comments removed, (file1.py.nocom).
      Lines containing only a comment would not be removed. Just the comment
      removed. The line numbering would need to stay the same.

  ii. A file containing the locations for all the comments as well as the
      actual comment text. Something like:
      1,1:# File 1 code
      4,15:# comment 1 

2. Do the diff between file1.py.nocom and file1.py.nocom, but without the -y
   flag. This will be easier to parse. Even easier, use the -c flag with a 
   really high value. Hopefully you can get the whole file in the diff
   without any missing "common" lines that way.

3. Go through the output from #2 and add back in the comments using the info
   from 1.ii. I experimented with manually editing the diff from #2 and 
   applying it with vim, but it didn't seem to like one of the "common" lines 
   having a comment change. But there may be some tool that will allow you to 
   view it. Barring that:

4. Use the commented diff output to recreate yourself the -y flag style 
   output. I guess the tricky part will be determining the width of the
   left side and printing out the right column. If on #2 you weren't able
   to get all the common lines into the diff output using the -c flag, then 
   here you'll have to re-add those missing common lines.

The above won't (easily) work with docstrings, and there are probably other cases I haven't thought of. I guess it might need to be tweaked if you have additional/removal of comment lines between files as well. But there's my two cents. It seems doable, but definitely a chunk of work.

score 0 · Answer 2 · answered Apr 14 '21 at 04:21

0

You could preprocess them with sed. You could make a wrapper that does something like:

sed -e 's/#.*$//' file1.py > file1.stripped
sed -e 's/#.*$//' file2.py > file2.stripped
diff -y file1.stripped file2.stripped
rm file1.stripped file2.stripped

answered Apr 14 '21 at 04:21

AgentSmith

46
4

Diff tool filter

2 Answers2

Comment-preserving diff algorithm: