Difference between gcc and Microsoft preprocessor

Question

I discovered that Microsoft Visual Studio compiler and gcc preprocess the following small snippet differently:

# define M3(x, y, z) x + y + z
# define M2(x, y) M3(x, y)
# define P(x, y) {x, y}
# define M(x, y) M2(x, P(x, y))
M(a, b)

'gcc -E' gives the following:

a + {a + b}

, while 'cl /E' issues a warning about missing macro argument and produces the following output:

a + {a, b} +

It seems that commas that came from nested macro expansions are not considered to be argument separators. Unfortunately, I found no description of the algorithm implemented in cl preprocessor, and so I'm not sure that my suggestion is correct. Does anyone know how cl preprocessor works and what's the difference between its algorithm and gcc's? And how the observed behaviour can be explained?

What version of gcc and CL? Besides that, I would say it's a bug in the gcc preprocessor as `M3` should have three arguments and only gets two. — Some programmer dude, Jul 13 '12 at 11:30
Just forget about to try to get MSVC and any C99 complying compiler to produce the same results. MSCV isn't C99 (nor C11) they lack two versions of the standard behind. — Jens Gustedt, Jul 13 '12 at 13:15

Sebastian Mach · Answer 1 · 2012-07-13T12:01:14.767

# define M3(x, y, z) x + y + z
# define M2(x, y) M3(x, y)
# define P(x, y) {x, y}
# define M(x, y) M2(x, P(x, y))
M(a, b)

Let us roll this out manually, step by step:

M(a, b)
--> M2(a, P(a, b))
--> M2(a, {a, b})

The standard says:

The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate

only parentheses are mentioned, so ...

--> M3(a, {a, b})
--> a + {a + b}

Important:

M3(a, {a, b})

Here, according to the previous quote from the standard, three "arguments" are passed to M3 (using single-quotes to describe tokens/arguments):

M3('a', '{a', 'b}')

which are expanded to

'a' + '{a' + 'b}'

And this is what cpp (4.6.1) gives verbatim:

# 1 "cpp.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cpp.cpp"




a + {a + b}

cpp (or gcc and g++) are correct, MSVC isn't.

As a nobleman make sure a bug report exists.

AFAIR this precise rules for the preprocessor came with C99. So for MSVC it isn't a bug, I don't think they claim to conform to C99. Code written for MSVC simply isn't portable, nowadays. — Jens Gustedt, Jul 13 '12 at 13:14
@JensGustedt: My take is: A C compiler should implement the current standard. If MSVC is neither C99, nor C11 conforming, than it is not a C compiler, but at max specifically a C89 compiler. Same argumentation for C++ :) — Sebastian Mach, Jul 13 '12 at 13:29

SingerOfTheFall · Answer 2 · 2012-07-13T12:02:54.797

4

The only logic that explains such a behavior looks like this.

CL way:

 M(a,b) 
 M2(a,P(a,b)) 
 M3(a,P(a,b))
 M3(a,{a,b}) -> M3 gets 2 arguments ( 'a' and '{a,b}') instead of 3.
    |  \ /
  arg1  |
      arg2

Gcc way:

M(a,b) 
M2(a,P(a,b)) 
M3(a,P(a,b))
M3(a,{a,b}) -> Gcc probably thinks there are 3 arguments here ('a', '{a', 'b}').
   |  | |
 arg1 | |
   arg2 |
     arg3

edited Jul 13 '12 at 12:02

answered Jul 13 '12 at 11:41

SingerOfTheFall

29,228
8
68
105

2

`cpp` doesn't think there are 3 arguments, it _knows_ there are, as per the standard there _are_ three arguments. Only commas within parentheses are not "preprocessor commas" (sidenote: I won't summon any downvote for this, just in case somebody will) :) – Sebastian Mach Jul 13 '12 at 12:00
@phresnel, Yea, probably. Unfortunately I can't check the standart right now, so this was just my assuption based on OP data ;) – SingerOfTheFall Jul 13 '12 at 12:03

score 1 · Answer 3 · answered Jul 13 '12 at 11:52

1

I think gcc gets it right, what Microsoft does is incorrect.

When macro substitution is done for the line

M2(a, P(a, b))

the standard (section 6.10.3.1) requires that before replacing the second parameter ("y") in the macro's replacement list ("M3(x, y)") with its argument ("P(a, b)"), macro replacement is to be performed for that argument. This means "P(a, b)" is processed to "{a, b}" before it is inserted, resulting in

M3(a, {a, b})

which is then further replaced to

a + {a + b}

answered Jul 13 '12 at 11:52

dpi

1,919
17
17

Thanks. I agree that MS is incorrect, but the problem is slightly different. Algorithm of preprocessing is described in the standard, and MS preprocessor obviously doesn't follow this algorithm. Do you know (or do you have a reasonable assumption) how MS preprocessor works? – Sergey Syromyatnikov Jul 13 '12 at 13:47
@SergeySyromyatnikov: At least one flaw is that it doesn't recognize commas within `{}`, but the standard says that only commas within `()` should be ignored. – Sebastian Mach Jul 13 '12 at 14:48
@SergeySyromyatnikov I have no idea. MSCV [claims](http://msdn.microsoft.com/en-us/library/02y9a5ye) to be C90-compliant, with some extensions. As far as I can tell C90 doesn't differ from the current C standard in this regard; [here is a draft](http://flash-gordon.me.uk/ansi.c.txt) for the old standard, the relevant section is 3.8.3. – dpi Jul 13 '12 at 15:11
BTW, "{}" are not important and can be omitted or replaced with other symbols. – Sergey Syromyatnikov Jul 13 '12 at 15:29

Difference between gcc and Microsoft preprocessor

3 Answers3

Linked