0

I'm working on a git diff parser. The main task is to find all changed function signatures. Sometimes in the chunk line with @@@ .... @@@ contains these information but sometimes not. Last time I changed in greet() cout message and it is visible on first image as changed line and it is correct, but above in @@@... line appears "void functOne() {" and that is not changed. The second picture is about a dummy cpp source code to test git diff.

The main questions are
How can I list all changed function's signatures?
Why sometimes appears unchanged function name ?
Why sometimes doesn't appears any function name/signature in line with @@@.... ?

enter image description here

enter image description here

Cof
  • 47
  • 4
  • 2
    You do notice that you cannot tell without 100% certainty without parsing the whole file? The "function" may just be something that's commented out in a multi line comment or a string literal spanning multiple lines... That is not to mention the possibility of the function signature being produced by a preprocessor macro (which may not even be defined in the file you're analyzing)... – fabian Feb 13 '22 at 12:23
  • Also note that the code shown appears to have semicolons missing. The parsing process certainly isn't going to be any easier if the code being parsed is syntactically invalid. To do this properly you probably need a tool based on something like clang's libTooling that can parse the versions of the file being compared and generate output in a suitable format. – G.M. Feb 13 '22 at 12:29
  • 2
    Don't post images of text, least of all code! Copy-paste text *as text* into your questions. – Some programmer dude Feb 13 '22 at 14:42
  • I thought some scenario about parsing, but the main questions are that tree and all about git diff not about the parsing. – Cof Feb 13 '22 at 16:49

2 Answers2

1

The git diff command doesn't care about any functions. git repositories can contain any kind of text files (binary files too, but that's immaterial here), not just C++ source.

The diff command doesn't attempt to interpret the file in any way. Only a C++ compiler can fully understand a C++ file and process all function declarations.

The diff command only looks for discrete lines of text that changed and shows them together with a few unchanged lines that precede and follow them.

If the changed lines happen to be at the beginning of a function declaration, then this would include the function declaration. If they are in the middle of a long function, you only see the few preceding lines, that's it.

There are git diff options that control how many unchanged lines are shown (check git's documentation). Specifying a million lines, for example, results in the entire file getting shown, with all the changed lines marked up.

You can do that if you wish, then try to figure out the names of all the changed functions yourself, but until you write a complete C++ compiler, yourself, your heuristic parsing attempts won't be 100% correct. You might've noticed, tucked away in git diff output an indication of what git guessed the changed function might be. But, since git is also not a C++ compiler, that's also wrong, occasionally.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Thanks for your answer but you are wrong, because when I deleted a longer(60 lines) function's line in the middle the line with @@@... contained the signature. If I right default the context a in git diff result is +-3 lines. In the original posted code is the shame because the line "void functionOne() {" is further away than three lines from the changed code. And if I use -U0 switch the context lines +- are 0, but in the line with @@@.... still contains the signature. – Cof Feb 13 '22 at 13:11
  • Where exactly did I claim that git always gets it wrong? Which part of your observation contradicts the statement "that's also wrong, occasionally"? – Sam Varshavchik Feb 13 '22 at 18:54
  • Sorry but I don't understand your last comment exactly. – Cof Feb 13 '22 at 19:12
  • 1
    My answer said exactly one thing: git is not a C++ compiler, and only a C++ compiler can fully understand a c++program, and git only compares individual lines of text, and it only makes a best guess at the name of the function with the changes, which sometimes gets it wrong. That, pretty much, was the only thing my answer said. Therefore, "you are wrong" could only possibly mean that just because after tweaking a few things git showed the right function name, I must've been wrong when I wrote "also wrong, occasionally". Which, of course, isn't true. One does not preclude the other. – Sam Varshavchik Feb 13 '22 at 19:29
1

Sometimes in the chunk line with @@@ .... @@@

Git calls this a hunk header (after other diff software that also calls it that).

... contains [the function name] but sometimes not.

What Git puts in the function section of a diff hunk header is produced by matching earlier lines against a particular regular expression, as described in the gitattributes documentation under xfuncname (search for that string). But note that this is a regular expression, and regular expressions are inherently less capable than parsers; there will always exist valid C++ constructs that can be parsed, but not recognized by some regular expression you can write.

If Git's built in C++ xfuncname pattern is not adequate for your use, you can write your own pattern. But it's always going to be limited because regular expressions can only recognize regular languages (these are CS-theoretical or informatics terms, not to be interpreted as ordinary English language; for more, see, e.g., Regular vs Context Free Grammars).

torek
  • 448,244
  • 59
  • 642
  • 775