GCC preprocessor PLUS static analysis?

Question

I am aware of how to run the gcc preprocessor. Gcc evidently performs a static analysis/optimization of the code, because if you e.g. add two constants, or shift a constant, the resulting assembly code is the same whether you keep "a = constant << 3" or if you perform that operation manually and have "a = shifted_constant" in the code.

Given this slice from a SHA256 code,

  A=0x6a09e667;
  B=0xbb67ae85;
  C=0x3c6ef372;
  D=0xa54ff53a;
  E=0x510e527f;
  F=0x9b05688c;
  G=0x1f83d9ab;
  H=0x5be0cd19;

  P(A, B, C, D, E, F, G, H, W[ 0], 0x428a2f98);

after running gcc -E I get

 { tmp = ((((E) >> (6)) | ((E) << (32-(6)))) ^ (((E) >> (11))
       | ((E) << (32-(11)))) ^ (((E) >> (25)) | ((E) << (32-(25)))))
       + (G ^ (E & (F ^ G))) + 0x428a2f98 + W[ 0]; D += (tmp + H);
   H  += tmp + ((((A) >> (2)) | ((A) << (32-(2)))) ^ (((A) >> (13))
       | ((A) << (32-(13)))) ^ (((A) >> (22)) | ((A) << (32-(22)))))
       + ((A & B) | (C & (A | B))); };

For the P-line, which is a nested macro of macros.

That's nice, but I would like to have a more digested result, where constants have been replaced, constant arithmetics have been performed etc. So if I got instead of the above something like this:

{ tmp = hex_constant1 + W[0]; D += (tmp + 0x5be0cd19); H += tmp + hex_constant2; };

It would be much more to my liking. I hadn't the patience to manually compute the expressions, but basically they should fold to two hex constants.

Is there any tool / command line option to perform this?

Are you sure you aren't mixing the preprocessor and compiler? The preprocessor only replaces your `defines` (and `includes` and other macros) it does not perform any kind of optimizations. That is the compiler's job. — UnholySheep, Nov 16 '16 at 14:18

score 4 · Answer 1 · answered Nov 16 '16 at 14:22

GCC do not optimize preprocessed C code. After preprocessing, this code translates to high-level IR (GIMPLE), where most optimizations are applied, before lowering to low-level IR like RTL.

What do you want is optimized GIMPLE dump to look up these to constants. You shall use optimized tree dump like this:

gcc -O2 -fdump-tree-optimized test.c -S

Then look for file like test.c.211t.optimized (number 211 may differ and depends on gcc version). Gimple code is very similar to primitive C, so you will have no problems to read your constants from it.

Ira Baxter · Accepted Answer · 2016-11-16T16:38:48.260

Most compilers run the preprocessor to get a text string for the program source, and then parse that to compiler internal data structures. Once the program is parsed, you cannot see what the compiler does to the code (well, maybe the compiler has some kind of debugging dump, but you cannot count on that).

To do what you want, you need to do partial evaluation of the code at the location of the expanded macro. No standard compiler will help you do this.

You need a program transformation system (PTS), which is tool that parses source code, applies code transformations, and then will regenerate the modified source text. A good PTS will let you write sets of source-to-source transformation rules to be applied, each rule in the form of:

 when you see *this*, then replace it by *that* if *condition* is true

To do simple constant folding, you need rules like:

  rule  simplify_times_zero(e: power): product -> product
       " \e * 0 " ->  " \e ";

which handles the special (algebraic) case of multiplying something times zero. You obviously need a bunch of these rules, more below.

Your PTS must also be able to read the specific dialect of C or C++ you are using, and be able to resolve any identifier to its definition (otherwise the cannot know that E is constant where it used).

The only PTS I know of that can parse many C or C++ dialects, and carry out macro expansions, is our DMS Software Reengineering Toolkit. It accepts rules using the syntax of the above example.

The additional rules you need are rules that actually do arithmetic:

rule fold_addition(c1: INT32CONSTANT, c2: INT32CONSTANT): product -> product =
     " \c1 + \c2 " ->  int32_multiply(c1,c2);

where int32_multiply is a function that does the math. You need one of these for each operator and operand type, that honor the rules of the C and C++ language.

You also need rules that do substitution of known values:

rule substitute_defined_constant(i: IDENTIFIER): primitive -> primitive
  " \i " ->  macro_definition_value(i) if  is_defined_constant(i);

where is_defined_constant looks up and identifier in the symbol table built by DMS's front end for macro definitions, and checks that i is macro of the form of "define i " at the point where i is referenced. By writing this as a conditional, the macro_definition_value function isn't called unless the macro definition value actually exists. The raw symbol table support is provided by DMS's C and C++ front ends.

With this set of rules, and a rule strategy that applies these rules to the macro expansion point of interest, DMS should be able to "fold" the equation into the form you expressed. Now, all the transforms are done on DMS's internal representations of the program, but DMS can prettyprint the result so you could actually see this.

As a general capability in a development UI, this might be pretty useful. To actually set all this up and make it work in practice is probably a few days; DMS is a complex system, and C and C++ don't help.

If you are only going to do this once or twice, you are likely better off just biting the bullet and doing it by hand (or as another answer suggests, use a debugging to inspect the output of the compiler if that works out). If it is a daily task, the PTS solution is likely a real time (and accuracy) saver.

GCC preprocessor PLUS static analysis?

2 Answers2