5

The C preprocessor (cpp) seems like it should handle this code correctly:

#define A 1 // hello there

int foo[A];

I would expect to replace A with 1.

What happens is that A is replaced with 1 // hello there, which results in the following output from cpp -std=c99 test.c:

# 1 "test.c"

int foo[1 // hello there];

Which is not valid C and fails to compile.

How can I get cpp to perform the proper replacement?

Note on compiler: Using cpp from the latest (8.2.1, Dec 2016) Xcode on mac, so I doubt it's due to an outdated compiler.

machinaut
  • 495
  • 2
  • 4
  • 17
  • 2
    I don't think the preprocessor knows anything about comments. Why not just use a `/* */` block comment? – user2357112 Jan 12 '17 at 01:45
  • I can't reproduce this (http://ideone.com/hfQunc). What compiler are you using? – templatetypedef Jan 12 '17 at 01:49
  • 7
    Note that `//` is not a valid ISO C comment, it was introduced with C99. Make sure you're compiling (and preprocessing) with the C99 standard. – Schwern Jan 12 '17 at 01:51
  • 1
    How are you invoking your preprocessor? – melpomene Jan 12 '17 at 01:52
  • @Schwern Your use of "ISO C" puzzles me. Isn't C99 "ISO C" (specifically ISO 9899:1999)? – melpomene Jan 12 '17 at 01:53
  • why don't put the comment before the line? embedding it into the macro will make it scattering all over the place after preprocessing – phuclv Jan 12 '17 at 01:53
  • This reproduces even with `cpp -std=c99 test.c`, though I left out the flag for brevity (it doesn't affect the results at all). – machinaut Jan 12 '17 at 01:54
  • The version I'm using is `cpp` that comes with latest Xcode (version 8.2.1, Dec 2016), so I don't think the issue is an outdated compiler. – machinaut Jan 12 '17 at 01:55
  • @melpomene ***Technically*** you're correct. But I've only ever seen ISO C refer to C90 while C99 is always referred to as C99. I'll start using C90 to avoid ambiguity. – Schwern Jan 12 '17 at 01:59
  • 1
    @LưuVĩnhPhúc Comments don't participate in preprocessing. What we see here is an artifact in getting text output from the preprocessor, which is something outside the standard. – Potatoswatter Jan 12 '17 at 02:35
  • Would `#define PERIOD (15 /* minutes */)` work for you? – chux - Reinstate Monica Jan 12 '17 at 02:37
  • Comments are removed from the code at one of the very first stages of translation, well before preprocessor macro subnstitution occurs. Your results indicate a broken compiler. – AnT stands with Russia Jan 13 '17 at 08:32

3 Answers3

5

Somewhat to my surprise, I can reproduce the problem on my Mac (macOS Sierra 10.12.2; Apple LLVM version 8.0.0 (clang-800.0.42.1)) using /usr/bin/cpp which is the XCode cpp — but not using GNU cpp (which I invoke using just cpp).

Workarounds include:

/usr/bin/gcc -E -std=c99 test.c

This uses the clang wrapper gcc to run the C preprocessor and correctly handles the version. You could add a -v option and see what it runs; I didn't see it running cpp per se (it runs clang -cc1 -E with lots of other information).

You can also use:

clang -E -std=c99 test.c

It's effectively the same thing.

You could also install GCC and use that instead of XCode. There are questions with answers about how to get that done (but it isn't for the faint of heart).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    I can reproduce with `/usr/bin/cpp -std=c99` but I cannot using `clang -E -std=c99`. I suspect `/usr/bin/cpp` is doing some odd backwards compat thing. – Schwern Jan 12 '17 at 02:11
  • 1
    Interesting that even `clang --driver-mode=cpp test.c` does the right thing. Something up with Apple's packaging of `cpp` maybe? – machinaut Jan 12 '17 at 02:12
  • 2
    @Schwern: yup — I can't work out what is going on exactly, but `cpp` seems to have a blind spot. Curious — tantamount to a bug. If it isn't going to honour `-std=c99`, it should complain about it, not silently accept but ignore it. – Jonathan Leffler Jan 12 '17 at 02:16
  • 1
    Okay this seems to have narrowed down to an Xcode bug (reported it to Apple) -- should the question/title be updated accordingly? – machinaut Jan 12 '17 at 02:22
  • @machinaut: It might be worth updating the title to something like 'How to get the XCode 8 standalone C preprocessor to recognize // comments?' (since `/* … */` is also a single-line comment, or at least is a comment on a single line). It should be a problem for a limited time — until Apple fixes the problem — so the version in the title is reasonable (you could use 8.2 or 8.2.1 if you preferred). I don't have older versions of XCode installed so I can't test backwards; it is likely to be a problem in older versions too. It is probably a good idea to add the [tag:osx] and/or [tag:xcode] tags. – Jonathan Leffler Jan 12 '17 at 18:00
2

Note that // is not a valid C90 comment. It was introduced in C99, so make sure your compiler and pre-processor know they're to use the C99 standard. In many that's -std=c99. (The question was since edited to make that clear)


Next is that I don't believe the pre-processor cares about comments. From the 6.10 of the C99 spec shows the grammar of preprocessor directives and nowhere does it mention comments...

The ANSI C standard makes it clear comments are supposed to be replaced in 2.1.1.2 "Translation Phases" phase 3 (5.1.1.2 in C99). (Drawing from this other answer).

  1. The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

Older tools might not have followed that either because they predate any C standard or they had bugs or they interpreted the standard differently. They've likely retained those bugs/quirks for backwards compatibility. Testing with clang -E -std=c99 vs /usr/bin/cpp -std=c99 confirms this. They behave differently despite being the same compiler under the hood.

$ /usr/bin/cpp --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

$ ls -l /usr/bin/cpp
-rwxr-xr-x 1 root wheel 18240 Dec 10 01:04 /usr/bin/cpp
$ ls -l /usr/bin/clang
-rwxr-xr-x 1 root wheel 18240 Dec 10 01:04 /usr/bin/clang


$ /usr/bin/cpp -std=c99 test.c
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 330 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2


int foo[1 // hello there];

$ /usr/bin/clang -E -std=c99 test.c
# 1 "test.c"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 331 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.c" 2


int foo[1];

I suspect invoking clang as /usr/bin/cpp is causing bug/quirk compatibility with the original behavior of cpp established back when the behavior was unclear.

I guess the lesson here is to use cc -E rather than cpp to ensure consistent behavior.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    1. OP said "*This reproduces even with `cpp -std=c99 test.c`*". 2. The preprocessor is C's tokenizer. It has to care about comments, otherwise it couldn't do its job. – melpomene Jan 12 '17 at 02:01
  • @melpomene 1. Question was edited while I was writing the answer. As for 2... while the preprocessor has to care about comments in normal C code, it doesn't say anything about comments within macro definitions. I don't see anything in the spec that says it has to. 6.4.9.2 could be interpreted to mean it applies everywhere, even macros, but I'm not sure because 6.4.9.3 has an example of `#include "//e" // undefined behavior`. – Schwern Jan 12 '17 at 02:08
  • Yeah, confirming I edited to add `-std=c99`doesn't fix it. Sorry about the mid-flight collision. Anything special I should do about that? – machinaut Jan 12 '17 at 02:09
  • 2
    http://stackoverflow.com/questions/1476892/poster-with-the-8-phases-of-translation-in-the-c-language/1479972#1479972, http://stackoverflow.com/questions/1510869/does-the-c-preprocessor-strip-comments-or-expand-macros-first – melpomene Jan 12 '17 at 02:11
  • @machinaut No, it's just a thing that happens. – Schwern Jan 12 '17 at 02:11
  • https://gcc.gnu.org/onlinedocs/cpp/Initial-processing.html (There's a `/* */ # /* */ define ...` example at the bottom) – melpomene Jan 12 '17 at 02:16
  • The behavior was not unclear in C90; see the "*sorry, but what you link to is not the ANSI C spec; the actual spec describes the translation phases in section 2.1.1.2*" comment. – melpomene Jan 12 '17 at 02:17
  • @melpomene Well, it was unclear to someone. :) I think I've synthesized the full answer. – Schwern Jan 12 '17 at 02:23
  • C90 had (almost*) exactly the same language in 5.1.1.2/3 (there is no 2.1.1.2 in ISO 9899:1990). \* C99 added the *partial* in "*or in a partial comment*". – melpomene Jan 12 '17 at 02:30
  • @melpomene I only have the ANSI C and ISO C99 specs available, I guess the numbering changed. – Schwern Jan 12 '17 at 02:37
  • @Olaf Thanks, I'll add it to my collection of standards. – Schwern Jan 12 '17 at 03:38
0

From the C11 specification (emphasis added):

5.1.1.2 Translation phases

The precedence among the syntax rules of translation is specified by the following phases6).

  1. [...] multibyte characters are mapped [...] to the source character set [...] Trigraph sequences are replaced [...]

  2. Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines [...]

  3. The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). [...] Each comment is replaced by one space character. [...]

  4. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. [...]

where note 6) states:

Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice. Source files, translation units, and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation.

Hence, an implementation conforming to the C11 specification is not required to have a separate preprocessor. Which means that the cpp command can do whatever it wants. And the compiler driver is allowed to perform phases 1 through 3 be any means it wants. So the correct way to get the output after preprocessing is to invoke the compiler driver with cc -E.

user3386109
  • 34,287
  • 7
  • 49
  • 68
  • The preprocessor does all of the above: https://gcc.gnu.org/onlinedocs/cpp/Initial-processing.html – melpomene Jan 12 '17 at 02:21
  • @melpomene That may be true for gcc, but it's not a requirement, and given that the OP is *"using cpp that comes with latest Xcode"*, he's almost certainly *not* using gcc. – user3386109 Jan 12 '17 at 02:23
  • I think this provides the final piece of the puzzle why `cpp` and `cc -E` behave differently. – Schwern Jan 12 '17 at 02:39
  • Still, there seems to be a well-established order. Comments are blank tokens for the language as a whole. Regardless of how many tools or steps are involved, a C99 compiler should not expand the macro with the comment. The only possible explanations are: 1) a bug in the compiler; and 2) that previous C standards handle this differently. – giusti Jan 12 '17 at 03:24
  • @giusti You missed the point that the OP did not run the compiler. He ran an undocumented command called `cpp`. What he *should* do is run the compiler with `cc -E`. – user3386109 Jan 12 '17 at 03:28