0

I have a C header file containing various declarations of functions, enums, structs, etc, and I hope to extract all declared function names into a boost.preprocessor data structure for iteration, using only the C preprocessor.

All function declarations have two fixed distinct macros around the return type, something like,

// my_header.h
FOO int * BAR f(long, double);
FOO void BAR g();

My goal is to somehow transform it into one of the above linked boost.preprocessor types, such as (f, g) or (f)(g). I believe it is possible by cleverly defining FOO and BAR, but have not succeeded after trying to play around with boost.preprocessor and P99.

I believe this task can only be done with the preprocessor as,

  1. As a hard requirement, I need to stringify the function names as string literals later when iterating the list, so runtime string manipulation or existing C++ static reflection frameworks with template magic are out AFAIK.
  2. While it can be done with the help of other tools, they are either fragile (awk or grep as ad-hoc parsers) or overly complex for the task (LLVM/GCC plugin for something robust). It is also a motivation to avoid external dependencies other than those strictly necessary i.e. a conforming C compiler.
sehe
  • 374,641
  • 47
  • 450
  • 633
SuibianP
  • 99
  • 1
  • 11
  • 2
    That's not really something the standard C or C++ preprocessor can do. The symbols `f` and `g` are not preprocessor macros or directives. It seems that you want to get a list of function names, but why? What is the underlying problem you need to solve? Why do you think the preprocessor is the correct tool to work with non-preprocessor symbols? – Some programmer dude Jul 18 '22 at 10:13
  • @Someprogrammerdude I am under the illusion that this can be done with the preprocessor because the declaration syntax somewhat resembles boost preprocessor data syntax, and because various magics such as deferred expansion evaluation and recursive inclusion are proven viable by boost.preprocessor. I am writing a layer of wrappers above a third-party library, and I think this is a good way to prevent the layer from going out of sync with its corresponding library. Of course, it is possible with other tools, but IMO not as convenient as CPP for the reasons detailed in point #2 above. – SuibianP Jul 18 '22 at 10:30
  • I admire your confidence that it can be done using only the C preprocessor. I look forward to seeing your solution. In the mean time, I agree with @Someprogrammerdude and others — I don't think it can be done with just the C preprocessor, even with the two markers. – Jonathan Leffler Jul 18 '22 at 14:46
  • I'm also curious —  how would your header mark up a function such as: `void (*signal(int sig, void (*func)(int)))(int);` (which is the declaration for the Standard C [`signal()`](http://port70.net/~nsz/c/c11/n1570.html#7.14.1) function). Granted, given `typedef void (*SignalHandler)(int);`, that can be written `SignalHandler signal(int sig, SignalHandler func);` which could be marked up as `FOO SignalHandler BAR signal(int sig, SignalHandler func);`, but there are lots of complications in the C function declaration syntax. – Jonathan Leffler Jul 18 '22 at 14:48
  • With a function declaration like `FOO void BAR g();`, the preprocessor can expand `FOO` and `BAR` (if they are preprocessor macros) but what occurs on the rest of the line isn't really relevant for the preprocessor. Perhaps if using a macro to control function declaration, like `#define FUNCTION(r, f, a)` r f a` and used like `FUNCTION(void, g, (()))` or similar (extra parentheses needed to handle comma-separated lists, like arguments, properly). – Some programmer dude Jul 18 '22 at 16:37
  • @JonathanLeffler That's an interesting question. For now, the header does not contain any complex type declaration like function pointers, but I guess the vendor will most probably take the `typedef` way as you mentioned if the need occurs. – SuibianP Jul 19 '22 at 09:13

2 Answers2

1

I don't think this is going to work, due to limitations on where parentheses and commas need to occur.

What you can do, though, is the opposite. You could make a Boost.PP sequence that contains the signatures in some structured form and use it to generate the declarations as you showed them. In the end, you have the representation you want as well as the compiler's view of the declarations.

sehe
  • 374,641
  • 47
  • 450
  • 633
0

After some closer look at the internals of preprocessor tricks, I believe this is theoretically impossible. This answer is kind of a more detailed expansion on top of @sehe's nice answer.

The fundamental working principle of arbitrary preprocessor lists like those in boost.preprocessor is indirect recursion. As such, it requires a way to consume one argument and pass the remaining on. The only two ways for CPP are commas (which can separate arguments) and enclosing parentheses (which can invoke macros).

In the case of f(int, long), f is neither followed by a comma nor surrounded by a pair of parenthese, so there is no way for it to be separated from the following list by the preprocessor without knowing the name in advance.

It could have changed the game if there were a BAZ after f, but sadly there is not and I have no control over the said library header :(

There are other issues, albeit not as fatal, such as the UB of having preprocessor directives within macro definition or arguments.

Perhaps someday it would become possible to leverage the reflection TS to get all declared function names within a namespace as a consteval compile-time list and then iterate it with something along the lines of constexpr for, all in a semantic and type-safe manner... who knows

SuibianP
  • 99
  • 1
  • 11