1

My goal is to get git diff to ignore C comments. I've been using a basic regex, and printing the diff to another file (it doesn't print anything otherwise). I've also tried the reverse, to get the diff to only show the comments (I'll later see if I can reverse engineer it). However, it doesn't behave as it's supposed to. Here's a few examples of what I've tried:

Trying to get the diff to show only the lines that begin with /*:

git diff -w -G'(^(/\**)' master > text.diff

Getting the diff to show lines that start with either * or / or end with the same:

git diff -w -G'(^[/\*])|($[^/\*])' master > text.diff

Getting the diff to show only non-comment lines (see How to make 'git diff' ignore comments):

git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])' master > text.diff

I'm running it under WSL and using git version 2.17.1 for reference.

Veronika
  • 29
  • 1

1 Answers1

1

My goal is to get git diff to ignore C comments ...

This is difficult, because:

  • Git doesn't understand C code;
  • parsing C comments requires lexical analysis across multiple lines;
  • git diff breaks up the input into lines too early.

Your best bet is therefore not to do this directly with git diff at all. Instead:

  1. Extract the file(s) to be compared from wherever they live (two commits, one commit and a regular file, one commit and an index copy of a file, etc).
  2. Use a C-comment-stripper that does understand how to analyze C source and detect (and remove1) the comments.
  3. Run the output of step 2 through some diff engine (regular diff, git diff, whatever you like).

If you wrap all of this up as a tool that git difftool can run, you'll get something serviceable and convenient. It will require generating lots of temporary files.

(Note that your attempt to use -G here is ultimately doomed. The -G expression will look for a comment within the changed lines, rather than whether the changed line is or is not in the middle of a long comment. Languages that have only comment-to-end-of-line, such as sh/bash, are more tractable than C. Backslash-newline sequences will still foil things though. See also Erik Aronesty's answer to the linked question.)


1Remember that in ANSI C, comments always separate tokens, so for ANSI C, replace comments with white-space, but in many traditional K&R compilers, comments simply vanish. This technique is used in place of the new-in-1989 token-pasting operator in some very old C code. You might want to support this mode by making step 2 have an option to leave out the white-space.

torek
  • 448,244
  • 59
  • 642
  • 775
  • You can set up a textconv to automatically run the filter on the text before diff (or, notably, blame) looks at it, no need to wrangle the filtering manually. – jthill Jul 07 '21 at 14:29
  • @jthill: true—but that wrecks the diffs where you *do* want to see the comment differences. I wouldn't want that myself. Making it a diff tool seems wiser. I can put that note in the answer itself if you like, though. – torek Jul 07 '21 at 14:56
  • 1
    mmm, yes, you could use a scratch no-checkout clone to set up a local .git/config and .git/info/attributes, `git clone -ns . ../scratch; git ls-files -oc | cpio -pdl ../scratch; cp .git/index ../scratch/.git/index; cd ../scratch`, that'd run real quick and you could use templates for the clone with the various differs preconfigured. kludgy but fast, I'm probably too fond of solutions like that. – jthill Jul 07 '21 at 15:36
  • Clever indeed. I'm not sure whether I'm impressed or appalled. :-) Seriously, sometimes having the implementation-guts exposed is a good thing, but sometimes it's not, and things like this are good inputs for these arguments. – torek Jul 07 '21 at 15:47
  • Wait, am I in "forms some sort of argument in that debate" territory? I am soooo proud :-) – jthill Jul 07 '21 at 15:49