1

I have a C file (for simplicity, assume it includes nothing). This C files requires several definitions of literal numbers to compile properly - and I want to figure out which definitions these are.

Naturally, one can try to compile the file, and at some point we would start to get failures; with some failure recovery, we might get failure notifications about additional defines. But - that's not what I want:

  • I'm not interested in completing the compilation of the program. Building a syntax tree (or even a simplified syntax tree of some kind) should be enough.
  • I can assume that, other than missing macros, the program is syntactically correct. Which, for C, should means it's syntactically correct, period.
  • I can assume that the relevant macros are all in uppercase, i.e. they have the form [A-Z][A-Z_0-9]* ).

What are my alternatives for getting the list of undefined macros?

Motivation: In reality, I'm feeding something into a dynamic compilation library, and I want to check beforehand if all necessary macros have been defined, without knowing a priori which macros the file needs (i.e. it could be different ones for different input files).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Only half joking: An IDE may give you that information, probably based on a background syntax check similar to what your gcc call does. – Peter - Reinstate Monica Jun 01 '21 at 16:32
  • 1
    @Peter-ReinstateMonica: I was hoping that one of the solutions might be replicating what's done in some IDE. – einpoklum Jun 01 '21 at 16:33
  • So perhaps we should know about your workflow in order to come up with an emacs or vim script! In a (c)make project you may be able to define a suitable, additional build target with little extra effort that is being built periodically in the background, emulating what an IDE does. – Peter - Reinstate Monica Jun 01 '21 at 16:35
  • I think that your best solution may be a static analysis tool, but my information on these is rather out of date. Is `lint` still a thing? – Tim Randall Jun 01 '21 at 16:47
  • @Peter-ReinstateMonica: I did add a comment about my motivation. But this is not something I want to do within vim or emacs or another IDE - it has to happen after my main program is built, and it will repeatedly get small C (well, sort-of-C) programs to process. – einpoklum Jun 01 '21 at 16:48
  • Ah, I see: What you are compiling with the library is not hand-crafted. – Peter - Reinstate Monica Jun 01 '21 at 16:54
  • @Peter-ReinstateMonica: It might and it might not be. The point is, it will be something I don't have right now. It's my program which will see that source file for the first time. – einpoklum Jun 01 '21 at 19:03

2 Answers2

1

The ugly fallback solution:

Obviously, your fallback is to just compile the program. But - do so while minimizing irrelevant messages and irrelevant. This will be compiler-dependent, but with GCC for example, you can:

  • Avoid any output generation
  • Suppress warnings
  • Suppress notes
  • Be strictly standard-compliant, no GNU extensions
  • Disable the use of those dumb fancy quotation marks GCC insists on using

... using various command-line switches and when making it take input from the standard input stream rather than a file (only way I've found so far to suppress some of the notes). That looks like:

  cat your_program.c \
|  LC_CTYPE=C gcc -std=c99 -fsyntax-only -x c -fcompare-debug-second -

and the output could look like:

<stdin>: In function 'mult':
<stdin>:3:18: error: 'MY_CONSTANT' undeclared (first use in this function)

Now, if your program is correct other than the undefined macros (= undeclared identifiers), then you can easily parse the above with a bit of shell scripting:

  cat your_program.c \
| LC_CTYPE=C gcc -std=c99 -fsyntax-only -x c -fcompare-debug-second - \
| sed -r '/error: /!d; s/^.*error: '"'//; s/'.*//;" \
| sort -u

This has the further disadvantage of not being fully embeddable into your program, i.e. you can't invoke the partial compilation using some library in some program of yours, then programmatically parse the output. You would need a system()-type call.

Note: If your program can have other errors, the pattern for dropping the line in the sed command will need to be a little more specific.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
0

You could use something around the idea that every identifier-like non-keyword outside a comment in a C file must be declared somewhere. (I think! Is that correct?)

The basic idea is to generate a list of such identifiers and search the program and then the included headers for a declaration of each. While this can be done by hand and ad-hoc it probably makes sense to index all potential header files and to use something like ctags for indexing as well as finding (there is a libctags, as I just learned).

I assume that the solution doesn't have to be perfect — missed cases will simply fail compilation — but that you want to reduce such cases. In that case the parsing of the source code for identifiers does not have to be perfect (it can ignore nested comments etc.) and can probably be done "manually" with acceptable effort.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
  • The solution does has to be perfect at least in terms of soundness, in the sense that I cannot accept identifiers which aren't really macros I need to define. I would also like perfect completeness but maybe it's not 100% necessary. However - given a solution to this problem, I would proceed to more complicated cases: include files; `#if` blocks... – einpoklum Jun 01 '21 at 18:47