6

I'm trying to find the significant differences in C/C++ source code in which only source code changes. I know you can use the git diff -G<regex> but it seems very limiting in the kind of regexes that can be run. For example, it doesn't seem to offer a way to ignore multiline comments in C/C++.

Is there any way in git or preferably libgit2 to ignore comments (including multiline), whitespaces, etc. before a diff is run? Or a way of determining if a line from the diff output is a comment or not?

onetwopunch
  • 3,279
  • 2
  • 29
  • 44
  • 3
    Doubtful. If you were really persistent you could pre-process the two files and then diff the output. – Andrew C Feb 16 '15 at 05:33
  • What git diff command have you tried, with what `regex`? – jub0bs Feb 16 '15 at 08:16
  • @AndrewC Thats what I was afraid of. Currently we are just feeding output of git cat-file for each version through a tool but since both files are huge, were bumping into the top of the heap for large repos. I'm trying to find some way of using libgit2 within our tool to make this more memory and time efficient. – onetwopunch Feb 16 '15 at 17:02
  • It's possible to use gitattrubutes to change the diff program used for specific files (usually by file extension) but there seems to be nothing stopping you defining an "external diff driver" for "*" files. – user3710044 Feb 22 '15 at 16:47
  • may [tortoiseGit](https://code.google.com/p/tortoisegit/) may help you out with a graphical user interface of *diff*? – Martin Feb 22 '15 at 17:57
  • At least to ignore whitespace is easy: git diff has several `--ignore...` options to ignore whitespace in different contexts. – bjhend Feb 23 '15 at 20:04

2 Answers2

2

git diff -w to ignore whitespace differences.

You cannot ignore multiline comments because git is a versioning tool, not a language dependent interpreter. It doesn't know your code is C++. It does not parse files for semantics, so it cannot interpret what is comment and what isn't. In particular, it relies on diff (or a configured difftool) to compare text files and it expects a line-by-line comparison.

I agree with @andrew-c that what you are really asking is to compare the two pieces of code without comments. More specifically helpful, you are asking to compare the lines of code where all multiline comments have been turned into empty lines. You keep the blank lines there so you have the correct line numbers to reference on a normal copy.

So you could manually convert the two code states to blank out multiline comments... or you might look at building your own diff wrapper that did the stripping for you. But the latter is not likely to be worth the effort.

Joe Atzberger
  • 3,079
  • 1
  • 18
  • 16
0

You can achieve this using git attributes and diff filters as described in Viewing git filters output when using meld as a diff tool to call a sed script, which however is pretty complex on its own if you want it to handle all cases like comment delimiters inside string literals etc.

Bernhard Stadler
  • 1,725
  • 14
  • 24