20

I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.

Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //, /* */ and #.

kaya3
  • 47,440
  • 4
  • 68
  • 97
Matt V.
  • 9,703
  • 10
  • 35
  • 56

7 Answers7

5

To use visual diff, you can try Meld or DiffMerge.

DiffMerge

Its rulesets and options provide for customized behavior.

GNU diffutils

From the command-line perspective, you can use --ignore-matching-lines=RE option for diff, for example:

diff -d -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.


See also:

Alternatively, check other diff apps, for example:

Community
  • 1
  • 1
kenorb
  • 155,785
  • 88
  • 678
  • 743
1
diff <file1> <file2> | grep -v '^[<>]\ #'

Far from perfect but it will give an idea of the differences

Vadym Tyemirov
  • 8,288
  • 4
  • 42
  • 38
1

See our Smart Differencer line of tools, which compare computer language source files using the language structure rather than the layout as a guide. This in particular means it ignores comments and whitespace in comparing code.

There is a SmartDifferencer for PHP.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • PS: Matt, we're in Austin, too. – Ira Baxter Sep 22 '11 at 08:01
  • @TomasTintera: Hmm, we think SmartDiff is, well, pretty smart. Could you be clearer about which language (PHP? Java? ...) Smart Diff you tried, and the circumstances/actual that did not produce what you expected, just exactly what you expected? (You can send an example to "support@semanticdesigns.com" and we'll look at it). – Ira Baxter Jul 18 '17 at 15:37
  • @TomasTintera: I note the OP was looking for a tool that would ignore comment (changes). SmartDiff does what OP requested. – Ira Baxter Jul 18 '17 at 17:23
  • Sure. Thank you for your reminder. Deleted my comment as it belongs to another question and answer. – Tomas Tintera Jul 20 '17 at 16:45
1

You can filter both files through stripcmt first which will remove C and C++ comments. For removing # comments, sed 's/#.*//' will remove those.

Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):

  1. If the latest version of the original code base is A and the latest of the copied code base is B, let's call the versions with comments removed for A' and B' (e.g. save those to temporarily files while processing).
  2. Find some common origin version and strip comments from that into O' (alternatively just re-use B' for this).
  3. Perform a 3-way merge of O', A' and B' and save to C'. KDiff3 is an excellent tool for this.
  4. Now you have the code changes you want merged, however C' is without comments, so get back into "normal" mode, do a new 3-way merge with A' as base and A and C'. This will pick up the changes between A' and C' (which is the code changes what you want) into the normal code base with comments based on version A.

Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.

Community
  • 1
  • 1
hlovdal
  • 26,565
  • 10
  • 94
  • 165
0

gnu diff supports ignoring lines wich match a regular expression:

diff --ignore-matching-lines='^#' file1 file2

and for folders:

diff -[bB]qr --ignore-matching-lines='^#' folder1/ folder2/

This would ignore all lines which start with a # at the line beginning.

StackUnderflow
  • 159
  • 1
  • 7
  • 2
    `This would ignore all lines which start with a # at the line beginning`. That's not true. [--ignore-matching-lines](http://www.gnu.org/software/diffutils/manual/html_node/Specified-Lines.html#Specified-Lines) behaves differently. – Avio Jul 31 '13 at 09:09
0

I tried: diff file1 file2 and diff -d -I ^#.\* file1 file2 and the result was the same in both cases - included comments;

however, diff -u file1 file2 | grep -v '^ \|^.#\|^.$' gives what I need: real diffs only, no comments, no empty lines. ;)

Anand Vaidya
  • 1,374
  • 11
  • 26
-1

Try:

diff -I REGEXP -I REGEXP2 file1 file 2

See: Regular expression at Wikipedia

Below are examples of regular expressions that would cause a diff to ignore a preprocessor directive and both standard comment block types.

In example:

\#*\n
/***/
//*\n
kenorb
  • 155,785
  • 88
  • 678
  • 743
awiebe
  • 3,758
  • 4
  • 22
  • 33
  • 2
    No, [--ignore-matching-lines](http://www.gnu.org/software/diffutils/manual/html_node/Specified-Lines.html#Specified-Lines) doesn't completely wipe out comments. – Avio Jul 31 '13 at 09:08