63

Consider this (horrible, terrible, no good, very bad) code structure:

#define foo(x) // commented out debugging code

// Misformatted to not obscure the point
if (a)
foo(a);
bar(a);

I've seen two compilers' preprocessors generate different results on this code:

if (a)
bar(a);

and

if (a)
;
bar(a);

Obviously, this is a bad thing for a portable code base.

My question: What is the preprocessor supposed to do with this? Elide comments first, or expand macros first?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Phil Miller
  • 36,389
  • 13
  • 67
  • 90
  • 5
    Good question - made me work for it tracking down the actual real info :) – Reed Copsey Oct 02 '09 at 17:39
  • FYI, use "#define foo(x) ##" to do a safer blank macro.... (or is it ###? :/) – Pod Oct 02 '09 at 17:48
  • By the way - what compiler are you using that behaves in your first example? I'm pretty sure it would break a lot of code - even if it might be smart to only use /* */ comments in #define's, my impression is that I've seen an awful lot of '//' comments used. – Michael Burr Oct 02 '09 at 20:14
  • 1
    Could it be that the preprocessor doesn't understand `//` comments, but the compiler does? Remember that, originally, C was supposed to understand only `/* */` comments, and `//` was a C++ extension. I think C only picked up `//` with C99. (Have I got my history correct here?). In fact, whatever compiler you're using, I'm curious to see how it handles `/*` – Aaron McDaid Feb 19 '14 at 18:48

6 Answers6

39

Unfortunately, the original ANSI C Specification specifically excludes any Preprocessor features in section 4 ("This specification describes only the C language. It makes no provision for either the library or the preprocessor.").

The C99 specification handles this explicity, though. The comments are replaced with a single space in the "translation phase", which happens prior to the Preprocessing directive parsing. (Section 6.10 for details).

VC++ and the GNU C Compiler both follow this paradigm - other compilers may not be compliant if they're older, but if it's C99 compliant, you should be safe.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • 10
    sorry, but what you link to is __not__ the ANSI C spec; the actual spec describes the translation phases in section 2.1.1.2; I posted an overview over these phases not long ago: http://stackoverflow.com/questions/1476892/poster-with-the-8-phases-of-translation-in-the-c-language/1479972#1479972 – Christoph Oct 02 '09 at 17:58
  • Yeah - not sure. I always use (mostly) C99 compliant compilers. It looks like the OP is using a C99 compiler, though, since // as comments didn't exist in C89. – Reed Copsey Oct 02 '09 at 18:07
  • Many C compilers I've come across support the C++/C99 '//' comments even if they don't support anything else from C99. – Michael Burr Oct 02 '09 at 19:36
  • @Novelocrat: As I said, if you're using C99, you're safe. Technically, though, there are few fully compliant C99 compilers (MS and GNU both are not 100% compliant, for example). – Reed Copsey Oct 02 '09 at 20:25
11

As described in this copy-n-pasted decription of the translation phases in the C99 standard, removing comments (they are replaced by a single whitespace) occurs in translation phase 3, while preprocessing directives are handled and macros are expanded in phase 4.

In the C90 standard (which I only have in hard copy, so no copy-n-paste) these two phases occur in the same order, though the description of the translation phases is slightly different in some details from the C99 standard - the fact that comments are removed and replaced by a single whitespace character before preprocessing directives are handled and macros expanded is not different.

Again, the C++ standard has these 2 phases occur in the same order.

As far as how the '//' comments should be handled, the C99 standard says this (6.4.9/2):

Except within a character constant, a string literal, or a comment, the characters // introduce a comment that includes all multibyte characters up to, but not including, the next new-line character.

And the C++ standard says (2.7):

The characters // start a comment, which terminates with the next newline character.

So your first example is clearly an error on the part of that translator - the ';' character after the foo(a) should be retained when the foo() macro is expanded - the comment characters should not be part of the 'contents' of the foo() macro.

But since you're faced with a buggy translator, you might want to change the macro definition to:

#define foo(x) /* junk */

to workaround the bug.

However (and I'm drifting off topic here...), since line splicing (backslashes just before a new-line) occurs before comments are processed, you can run into something like this bit of nasty code:

#define evil( x) printf( "hello "); // hi there, \
                 printf( "%s\n", x); // you!



int main( int argc, char** argv)
{
    evil( "bastard");

    return 0;
}

Which might surprise whoever wrote it.

Or even better, try the following, written by someone (certainly not me!) who likes box-style comments:

int main( int argc, char** argv)
{
                            //----------------/
    printf( "hello ");      // Hey, what the??/
    printf( "%s\n", "you"); // heck??         /
                            //----------------/
    return 0;
}

Depending on whether your compiler defaults to processing trigraphs or not (compilers are supposed to, but since trigraphs surprise nearly everyone who runs across them, some compilers decide to turn them off by default), you may or may not get the behavior you want - whatever behavior that is, of course.

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
5

According to MSDN, comments are replaced with a single space in the tokenization phase, which happens before the preprocessing phase where macros are expanded.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
4

Never put // comments in your macros. If you must put comments, use /* */. In addition, you have a mistake in your macro:

#define foo(x) do { } while(0) /* junk */

This way, foo is always safe to use. For example:

if (some condition)
    foo(x);

will never throw a compiler error regardless of whether or not foo is defined to some expression.

Vitali
  • 3,411
  • 2
  • 24
  • 25
2
#ifdef _TEST_
#define _cerr cerr
#else
#define _cerr / ## / cerr
#endif
  • will work on some compilers (VC++). When _TEST_ is not defined,

    _cerr ...

    will be replaced by the comment line

    // cerr ...

1

I seem to recall that compliance requires three steps:

  1. strip
  2. expand macros
  3. strip again

The reason for this has to do with the compiler being able to accept .i files directly.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • Excellent point - this can get complicated for multi-stage preprocessing (which most avoid entirely with 'include guards', but has some interesting benefits, especially in dependency resolution.) – John P Jul 05 '17 at 18:49