17

Is there a way to print step by step, what the C preprocessor is doing as it expands a macro?

For example, I would give it some C language text (ex: .h file(s)) to preprocess. For sake of demonstration, here's a simple example:

// somefile.h
#define q r
#define bar(x,z) x ## z
#define baz(y) qux ## y
#define foo(x,y) bar(x, baz(y))

So far, that's just to build a table of definitions.

Next comes the text to expand in detail. For this demonstration, I'm expecting the workflow/process/output to be something like this:

$ magical_cpp_revealer  somefile.h

Please enter some preprocessor text to analyse:
> foo(baz(p),q)

Here are the resulting preprocessor calculations:
,----.----.---------------------------.-----------------------------------------
|Step|Exp#|  Expression               |  Reason
|====|====|===========================|=========================================
| 00 | 00 |  foo(baz(p),q)            |  Original tokens.
| 01 |    |                           |  Definition found for 'foo': `foo(x,y)` = "bar(x, baz(y))"
| 02 | 01 |  bar(x, baz(y))           |  'foo' begins expansion. Original tokens shown.
| 03 |    |                           |  'foo' Stage 1: Raw parameter replacements elided: no # or ## operators present.
| 04 |    |                           |  'foo' Stage 2: Stringification elided: no # operators present.
| 05 |    |                           |  'foo' Stage 3: Concatenation elided: no ## operators present.
| 06 |    |                           |  'foo' Stage 4: Argument scan begins.
| 07 |    |                           |    Argument for parameter 'x' is "baz(p)"
| 08 | 02 |    baz(p)                 |    Scanning "baz(p)" for macros to expand.
| 09 |    |                           |    Definition found for 'baz': `baz(y)` = "qux ## y"
| 10 | 03 |    qux ## y               |    'baz' begins expansion. Original tokens shown.
| 11 | 04 |    qux ## p               |      'foo->baz' Stage 1: Raw parameter replacements performed
| 12 |    |                           |         using 'y' = "p".
| 13 |    |                           |      'foo->baz' Stage 2: Stringification elided: no # operators present.
| 14 | 05 |    quxp                   |      'foo->baz' Stage 3: Concatenation performed.
| 15 |    |                           |      'foo->baz' Stage 4: Argument scan elided: no parameters present.
| 16 |    |                           |      'foo->baz' Stage 5: Expansive parameter replacements elided: no parameters present.
| 17 |    |                           |      'foo->baz' Stage 6: Rescan begins
| 18 |    |                           |        No definition for 'quxp'
| 19 |    |                           |      'foo->baz' Stage 6: Rescan concludes.
| 20 | 06 |    quxp                   |    'baz' concludes expansion. Final result shown.
| 21 |    |                           |  'foo' Stage 4: Argument scan continues.
| 22 |    |                           |    Currently:
| 23 |    |                           |      'x' = "quxp"
| 24 |    |                           |      'y' = To Be Determined
| 25 |    |                           |    Argument for parameter 'y' is "q"
| 26 | 07 |    q                      |    Scanning "q" for macros to expand.
| 27 |    |                           |    Definition found for 'q': `q` = "r"
| 28 | 08 |    r                      |    'q' begins expansion. Original tokens shown.
| 29 |    |                           |      'foo->q': Stage 1: Concatenation elided: no ## operators present.
| 30 |    |                           |      'foo->q': Stage 2: Scan begins.
| 31 |    |                           |        No definition for 'r'
| 32 |    |                           |      'foo->q': Stage 2: Scan concludes.
| 33 | 09 |    r                      |    'q' concludes expansion. Final result shown.
| 34 |    |                           |  'foo' Stage 4: Argument scan concludes.
| 35 | 10 |  bar(x, baz(y))           |  'foo': Reminder of current token sequence.
| 36 | 11 |  bar(quxp, baz(r))        |  'foo' Stage 5: Expansive parameter replacements performed
| 37 |    |                           |     using 'x' = "quxp",
| 38 |    |                           |       and 'y' = "r".
| 39 |    |                           |  'foo' Stage 6: Rescan begins
| 40 |    |                           |    Definition found for 'bar': `bar(x,z)` = "x ## z"
| 41 | 12 |    x ## z                 |    'bar' begins expansion. Original tokens shown.
| 42 | 13 |    quxp ## baz(r)         |      'foo->bar' Stage 1: Raw parameter replacements performed
| 43 |    |                           |         using 'x' = "quxp",
| 44 |    |                           |           and 'z' = "baz(r)".
| 45 |    |                           |      'foo->bar' Stage 2: Stringification elided: no # operators present.
| 46 | 14 |    quxpbaz(r)             |      'foo->bar' Stage 3: Concatenation performed.
| 47 |    |                           |      'foo->bar' Stage 4: Argument scan elided: no parameters present.
| 48 |    |                           |      'foo->bar' Stage 5: Expansive parameter replacements elided: no parameters present.
| 49 |    |                           |      'foo->bar' Stage 6: Rescan begins
| 50 |    |                           |        No definition for 'quxpbaz'
| 51 |    |                           |        No definition for '('
| 52 |    |                           |        No definition for 'r'
| 53 |    |                           |        No definition for ')'
| 54 |    |                           |      'foo->baz' Stage 6: Rescan concludes.
| 55 | 15 |    quxpbaz(r)             |    'bar' concludes expansion. Final result shown.
| 56 |    |                           |  'foo' Stage 6: Rescan concludes
| 57 | 16 |  quxpbaz(r)               |  'foo' concludes expansion. Final result shown.
'----'----'---------------------------'-----------------------------------------

(Side note and caveat for future readers: I wrote the above trace by hand and it might not be 100% correct, at least in terms of representing how the preprocessor works.)

Note that I tried to not only illustrate the preprocessor's positive decisions about what what to do (ex: when it's found a definition and starts expanding), but also illustrated its negative decisions about what not to do (ex: when a token has no definition or when #+## operators are not present). That might sound kinda specific, but it's important for understanding why the preprocessor didn't do something that I expected it to do, often with a mundane conclusion along the lines of "I mispelled the definition or the token" or "I forgot to #include that one file".

I'll be even more relieved if there's a way to reveal what MSVC's CL.EXE is thinking when it uses "traditional preprocessor" logic to expand my macros.

Here's an example of what does not answer the question:

$ gcc -E somefile.h
...
quxpbaz(r)

Such is what I find in the answers to questions like Any utility to test expand C/C++ #define macros?.

When someone asks to see the "expansion" of a macro, gcc -E seems like a valid answer. I'm looking for something with higher fidelity, and I already know about gcc -E.

I'm writing ISO C11 code, but am including the C++ tag in case there is a tool or technique in that ecosystem with relevance to this.

I'm hoping someone out there reading this is maybe a compiler writer that has done or seen similar work (compiler tracing options?), or has authored a tool like this, or is just far luckier with their search results than I have been. Or if you keep tabs on all of the C-language offerings out there and are relatively certain this doesn't exist, then I'd find a negative answer to be helpful too, though I'd be curious as to why the C preprocessor would have been around for decades, obtained infamy for its "pitfalls", and yet still never seen a tool (or process) for pulling back the curtain on the preprocessor. (I hope this actually exists. fingers crossed)

cigien
  • 57,834
  • 11
  • 73
  • 112
chadjoan
  • 485
  • 3
  • 11
  • 3
    Since questions asking about recommending a tool are off-topic I changed the first paragraph a bit :D – Antti Haapala -- Слава Україні Oct 28 '20 at 09:24
  • I suppose you could download the source code of your favorite compiler and add print statements at various points in the preprocessor logic. – Daniel McLaury Oct 28 '20 at 09:57
  • Thanks Antti! That was a considerate way to do it. I didn't realize that rule, so I also edited it a bit to relax that constraint a bit. Obscure techniques, compiler features, and other things would work just fine as well, so long as it provides a trace and isn't more tedious than just working these things out by hand. – chadjoan Oct 28 '20 at 10:03
  • 1
    I really get a "Dear Santa ..." impression here. But it is clear, researched and I so much want it myself. So have an upvote instead of a close-vote. ;-) – Yunnosch Oct 28 '20 at 10:07
  • Hi Daniel; While you're right, I'm envisioning an arduous process. For example: trying to grok GCC's codebase, find the right spots, wait 30+ minutes for it to compile, learn that I Did It Wrong, more compiling (maybe not 30+ minutes again, but I'm not exactly trusting of build systems), then whoops that struct doesn't have the such-and-such, more work, more compiling, and so on. I feel like I'd end up implementing a new compiler feature that might as well be released for all, but then upstream might reject it. Maybe that's not the end of the world, but I hope there's a better way. – chadjoan Oct 28 '20 at 10:11
  • Thanks Yunnosch. I hope I was able to edit it well enough to keep ordinary "just-do-it-this-way" possibilities open. I feel like this one's going to be really difficult without introspective help from something that has, or is, a C preprocessor. And there's an unfortunate possibility that the existing ones might be incapable of this task. But I totally welcome any clever or knowledgeable individual to defy those expectations. – chadjoan Oct 28 '20 at 10:29
  • 1
    While such a feature will certainly be useful for the authors of a preprocessor, I don't feel it a good thing for nearly all users. If your preprocessor magic is so complicated that you need to see this analysis, you should consider to rethink your design. – the busybee Oct 28 '20 at 11:18
  • Short answer: Doesn't exist. Long answer: Why not build one? – tadman Oct 28 '20 at 20:30
  • 1
    You don't need to disassemble gcc to find a preprocessor. These ones might be easier to extend to do what you want" https://github.com/boostorg/wave https://github.com/lpsantil/ucpp https://github.com/facebookresearch/CParser – Jerry Jeremiah Dec 07 '20 at 21:00
  • 1
    Not sure if this is what you are looking for, but IDEs such as Eclipse can do a step-by-step expansion. Surely not as verbose as you describe, but yet handy enough. https://stackoverflow.com/questions/35472290/how-to-see-macro-expansions-step-by-step – Eugene Sh. Feb 19 '21 at 20:53

1 Answers1

3

I would suggest finding a good quality compiler/preprocessor and edit the pre-processor.

I would avoid GCC and clang, as they are too heavy weight IMO. I would have a look at cparser from libfirm and this file in particular: https://github.com/libfirm/cparser/blob/master/src/parser/preprocessor.c

Code from libfirm is super easy to read and edit, and it takes almost no time to build the project - in rough contrast to LLVM/clang or GCC.

It has eaten all C99 code I've thrown at it so far.

By the way I am not affiliated, I just think it rocks! I have just used the code with great results and received fantastic support, help and guidance on the IRC channel #firm @ freenode.

EDIT:

Sparse, as used by the kernel janitors team in Linux, is also easily hackable for such purposes. It includes a c-preprocessor as well: https://github.com/chrisforbes/sparse

https://www.kernel.org/doc/html/v4.12/dev-tools/sparse.html

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55