C preprocessor, recursive macros

Question

Why does M(0) and N(0) have different results?

#define CAT_I(a, b) a ## b
#define CAT(a, b) CAT_I(a, b)

#define M_0 CAT(x, y)
#define M_1 whatever_else
#define M(a) CAT(M_, a)
M(0);       //  expands to CAT(x, y)

#define N_0() CAT(x, y)
#define N_1() whatever_else
#define N(a) CAT(N_, a)()
N(0);       //  expands to xy

Uhhh.... what is it you're exactly trying to achieve here... and for what purpose? — t0mm13b, Apr 12 '11 at 21:30
I don't really want to achieve anything, just noticed this while working on something, and I'm curious about the reasons. It annoys me when I don't understand something :) . — imre, Apr 12 '11 at 21:41

score 20 · Accepted Answer · edited Jun 20 '20 at 09:12

20

In fact, it depends on your interpretation of the language standard. For example, under mcpp, a preprocessor implementation that strictly conforms to the text of the language standard, the second yields CAT(x, y); as well [extra newlines have been removed from the result]:

C:\dev>mcpp -W0 stubby.cpp
#line 1 "C:/dev/stubby.cpp"
        CAT(x, y) ;
        CAT(x, y) ;
C:\dev>

There is a known inconsistency in the C++ language specification (the same inconsistency is present in the C specification, though I don't know where the defect list is for C). The specification states that the final CAT(x, y) should not be macro-replaced. The intent may have been that it should be macro-replaced.

To quote the linked defect report:

Back in the 1980's it was understood by several WG14 people that there were tiny differences between the "non-replacement" verbiage and the attempts to produce pseudo-code.

The committee's decision was that no realistic programs "in the wild" would venture into this area, and trying to reduce the uncertainties is not worth the risk of changing conformance status of implementations or programs.

So, why do we get different behavior for M(0) than for N(0) with most common preprocessor implementations? In the replacement of M, the second invocation of CAT consists entirely of tokens resulting from the first invocation of CAT:

M(0) 
CAT(M_, 0)
CAT_I(M_, 0)
M_0
CAT(x, y)

If M_0 was instead defined to be replaced by CAT(M, 0), replacement would recurse infinitely. The preprocessor specification explicitly prohibits this "strictly recursive" replacement by stopping macro replacement, so CAT(x, y) is not macro replaced.

However, in the replacement of N, the second invocation of CAT consists only partially of tokens resulting from the first invocation of CAT:

N(0)
CAT(N_, 0)       ()
CAT_I(N_, 0)     ()
N_0              ()
CAT(x, y)
CAT_I(x, y)
xy

Here the second invocation of CAT is formed partially from tokens resulting from the first invocation of CAT and partially from other tokens, namely the () from the replacement list of N. The replacement is not strictly recursive and thus when the second invocation of CAT is replaced, it cannot yield infinite recursion.

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 12 '11 at 21:40

James McNellis

348,265
75
913
977

Interesting... The preprocessors in VC++ and the online Comeau compiler both expand N(0) to "xy". – imre Apr 12 '11 at 21:44
Also, is it somehow possible to work around this recursion limitation and make the last CAT evaluate? (Besides defining another alternative CAT?) – imre Apr 12 '11 at 21:52
1

I have a dim memory to the effect that because the `()` supplied to `N_0` came from outside any macro expansion, that counts as a *new* macro expansion, so the "blue paint" comes off `CAT()` and it can be expanded once more. So this might be a bug in mcpp. FWIW gcc agrees with Comeau and VC++. – zwol Apr 12 '11 at 21:54
The specification of the recursive replacement rules is absurdly convoluted; that's what happens when you try to write a complete, English-language specification for the behavior of a program _after_ the program has been written and modified over a period of two decades :-). There is a long, detailed discussion of the issue in the documentation for the MCPP conformance suite, which is included in the source distributions of mcpp. – James McNellis Apr 12 '11 at 22:01
Good to know about gcc -- right now I'm more interested in actually working code than standard compliance (although the only place where my code actually depends on this is some horrible dllexport/import stuff). So thanks for all the info, James and Zack. – imre Apr 12 '11 at 22:03
@imre: I'm interested to know why something so complex is needed for a dllimport/dllexport declspec. The idiom is to use a single macro (e.g. `MYPROJECT_EXPORT`) that is conditionally set to one of the two depending on whether "My Project" is being built. – James McNellis Apr 12 '11 at 22:06
@James McNellis: The explanation would be too long for a comment here (involves custom RTTI macros, nested classes, and class templates, all in the context of dll-exporting), are you interested enough to receive an email? :) – imre Apr 12 '11 at 22:27
@imre: Nope. If you know what you're doing, that's good enough for me :-) Best of luck, though. – James McNellis Apr 12 '11 at 22:29
You have quoted the wrong part of the defect report. The relevant quote is "The original intent of the J11 committee in this text was that the result should be 42, as demonstrated by the original pseudo-code description of the replacement algorithm provided by Dave Prosser, its author. The English description, however, omits some of the subtleties of the pseudo-code and thus arguably gives an incorrect answer for this case." and the operative word there is "arguably". Since that is only "arguable" but the argument goes against the intent, the argument is wrong, and so is mcpp. – Jim Balter Apr 12 '11 at 22:32
1

BTW, it's important to note that mcpp scores perfectly on the CPP validation suite ... written by the author of mcpp. So all that score shows is that mcpp does what its author thinks it should; it does not show that it is actually faithful to the C standard. – Jim Balter Apr 12 '11 at 22:35
1

@Jim: I would recommend reading the mcpp test suite documentation, which contains an eight page discussion on the subject and explains the contradictions in the specifications and the manner in which the specifications have changed. In C99 the behavior is explicitly unspecified. A conforming implementation may replace the second invocation of `CAT` or it may not. – James McNellis Apr 13 '11 at 02:29

score 3 · Answer 2 · edited Jul 02 '13 at 08:58

3

Just follow the sequence:

1.)

M(0); //  expands to CAT(x, y) TRUE 
CAT(M_, 0)
CAT_I(M_, 0)
M_0
CAT(x, y)

2.)

N(0); //  expands to xy TRUE
CAT(N_, 0)()
CAT_I(N_, 0)()
N_0()
CAT(x, y)
CAT_I(x, y)
xy

You only need to recursively replace the macros.

Notes on ## preprocessor operator: Two arguments can be 'glued' together using ## preprocessor operator; this allows two tokens to be concatenated in the preprocessed code.

Unlike standard macro expansion, traditional macro expansion has no provision to prevent recursion. If an object-like macro appears unquoted in its replacement text, it will be replaced again during the rescan pass, and so on ad infinitum. GCC detects when it is expanding recursive macros, emits an error message, and continues after the offending macro invocation. (gcc online doc)

edited Jul 02 '13 at 08:58

Morwenn

21,684
12
93
152

answered Apr 12 '11 at 21:42

Cacho Santa

6,846
6
41
73

1

Umm... I still don't get it. The two sequences both reach the same CAT(x, y) -- so why stop there in one case but not the other? – imre Apr 12 '11 at 21:46
I think the recursion here it depends on the interpretation of the standard like James McNellis said. Nice question imre. – Cacho Santa Apr 12 '11 at 22:08
1

@imre: In the case of `M(0)`, the second `CAT(...)` invocation results entirely from the first `CAT(...)` invocation, thus it is a strictly recursive call. In the case of `N(0)`, the second `CAT(...)` invocation results only partially from the first `CAT(...)` invocation and partially from other tokens that appear after that (the `()` in the replacement list of `N`). Thus, it is not entirely recursive. – James McNellis Apr 12 '11 at 22:09

score -1 · Answer 3 · answered Apr 12 '11 at 21:33

-1

There seems to be something that you might have failed to spot but your macro has N(a) CAT(N_,a)(), whereas M(a) is defined as CAT(M_, a) Notice the extra parameter brackets used....

answered Apr 12 '11 at 21:33

t0mm13b

34,087
8
78
110

1

I know. And correspondingly, N_0 is defined as a function-style (0-argument) macro. And for some reason, that seems to make a difference in recursive evaluations, but I don't know exactly why; that's my question. – imre Apr 12 '11 at 21:39

C preprocessor, recursive macros

3 Answers3

Linked