65
1. #define NUM 10
2. #define FOO NUM
3. #undef NUM
4. #define NUM 20
5. 
6. FOO

When I only run the preprocessor, the output file contains 20.

However, from what I understand, the preprocessor simply does text replacement. So this is what I think is happening (which is obviously wrong but idky):

  1. NUM is defined as 10.
  2. Therefore, in line 2, NUM is replaced as 10. So now we have "#define FOO 10".
  3. NUM is undefined.
  4. NUM is redefined and now is 20.
  5. FOO is replaced according to line 2, which was before line 4's redefinition, and is 10.

So I think the output should be 10 instead of 20. Can anything explain where it went wrong?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
OneZero
  • 11,556
  • 15
  • 55
  • 92
  • 1
    Try looking at the preprocessor output rather than guessing the preprocessor output –  Aug 21 '15 at 18:53
  • 5
    Is this a C or a C++ question? These two language are different, please pick one. – fuz Aug 21 '15 at 18:56
  • 8
    The precise behaviour of the preprocessor is in the standard, there's no need to guess. – Alan Stokes Aug 21 '15 at 18:56
  • 10
    @AlanStokes: Have _you_ tried figuring this out from the standard? I'm a language lawyer, yet 10 minutes in I'm no closer to being able to prove the behaviour. – Lightness Races in Orbit Aug 21 '15 at 19:00
  • 1
    check the value of FOO before you redefine NUM. Is it 10 or undefined? – RisingSun Aug 21 '15 at 19:00
  • 2
    @LightnessRacesinOrbit That's a fair point. I did understand it once. – Alan Stokes Aug 21 '15 at 19:01
  • Huh .. failing the word "recursively" being added to `[C99: 6.10.3/9]`, _is_ this even well-defined?? – Lightness Races in Orbit Aug 21 '15 at 19:04
  • 3
    `FOO` is never defined to be anything other than `NUM`. The laws of text substitution say that `NUM` will be whatever `NUM` is defined to be when it is encountered. At `line 6` `NUM` is defined to be `20`. – Galik Aug 21 '15 at 20:10
  • 1
    @FUZxxl "_These two language are different_" how? – curiousguy Aug 22 '15 at 03:06
  • 2
    "_the preprocessor simply does text replacement_" no, it does tokens replacement – curiousguy Aug 22 '15 at 03:07
  • @curiousguy http://stackoverflow.com/q/640657/995714 http://stackoverflow.com/q/12887700/995714 http://stackoverflow.com/q/24397967/995714 http://meta.stackoverflow.com/q/252430/995714 – phuclv Aug 22 '15 at 03:46
  • @OneZero - good question. I trust it's answered fully. Counter question: Q: which do you think has done more damage in the history of software development - increasingly abstruse versions of C++, or "language lawyers" who instinctively answer questions like this with "What does the standard say" vs. "How does it actually work"? – paulsm4 Aug 22 '15 at 04:12
  • @LưuVĩnhPhúc These answers are incorrect. C/C++ is a thing – curiousguy Aug 22 '15 at 04:28
  • @curiousguy In [many different ways](http://stackoverflow.com/q/31505402/417501). – fuz Aug 22 '15 at 06:31
  • @curiousguy Please give more explanation before dismissing the provided answers as invalid. – fuz Aug 22 '15 at 06:34
  • @fuzxxl: this question is restricted to the preprocessing phases, which are the same in the two languanges. – rici Aug 22 '15 at 09:27
  • @rici Are you sure they are equal? Last time I checked the rules how to form preprocessing tokens for C and C++ were different. – fuz Aug 22 '15 at 11:00
  • @fuzxxl, the differences are not relevant to this question, as you know. Tokenization is similarly different between C++ versions, but I have never seen you or anyone else demand a precise version tag because of those differences, unless directly relevant. – rici Aug 22 '15 at 13:11
  • 1
    @rici This question pertains a corner case of the preprocessor that could very well be different in the two languages as many corner cases are (I would have to carefully go though the standards to be sure) and you claim the language doesn't matter? Why do you ask for the “why” anyway if you don't care about standards? – fuz Aug 22 '15 at 13:26
  • 1
    @fuzxxl, i'm not sure who you think i am. It is true that i have been known to ask why on occasion, but here i am not asking anything. You can save yourself the trouble of searching through the standards by readimg my answer below, which might or might not convince you of my concern for the standards. Finally, this is not a "corner case". It is the algorithm for macro replacement from the beginning, even before there were standards. – rici Aug 22 '15 at 16:35
  • @rici I don't care who you are, but it seems that you have not understood why asking questions as “C/C++” is problematic. But oh well, I can only do so much against tag spammers... – fuz Aug 22 '15 at 16:51
  • 1
    @FUZxxl: Quite right. I don't understand why asking questions as "C/C++" is problematic, when the wording of both the question and the answer would be identical for both languages. To me, that just seems efficient; why duplicate an entire question and answer, or alternatively leave out an entire potential audience? But I guess that's just me. Anyway, I didn't ask this question, although it appears that you (at one point) thought I did, and in fact I have never asked a question tagged C. – rici Aug 23 '15 at 00:04
  • @rici Well, the answer is *not* equal for both languages. In this case, you are lucky that it is, yet you have to cite both standards to give a correct answer. The point of forbidding C/C++ questions is that there are many questions that just slap on both tags to get the larger audience but if you give an answer in the wrong language they dismiss it and tell you that they are actually programming in C++, not C. And that really sucks a lot. There are also many cases where the question concerns some corner cases that are different between these two or when I would give a different approach for C – fuz Aug 23 '15 at 09:10

4 Answers4

65

The text replacement is done where the macro is used, not where you wrote the #define. At the point you use FOO, it replaces FOO with NUM and NUM is currently defined to be 20.

David
  • 27,652
  • 18
  • 89
  • 138
  • 2
    I'm trying to find good wording for this in C99, but failing. Is this actually well-defined? – Lightness Races in Orbit Aug 21 '15 at 19:06
  • 7
    @LightnessRacesinOrbit, sure this is well defined. What difficulties do you have with the wording in the standard? AFAIR it clearly states that at the point of definition only the token sequence is read, and then it describes very precisely how the replacement takes place at point of invocation. – Jens Gustedt Aug 21 '15 at 19:15
  • 3
    @JensGustedt: Prove it with a standard quote answering this question, then we'll talk :) – Lightness Races in Orbit Aug 21 '15 at 19:21
  • 5
    Preprocessing happens in a single pass. When line 6 is hit, FOO is replaced with NUM. Preprocessing starts again at the beginning of NUM because of this exact issue. Macros that contain other macros. If this didn't subsquently replace the NUM with 20, no macros that call other macros would ever compile – cppguy Aug 21 '15 at 19:56
  • 6
    @LightnessRacesinOrbit: C11:6.10.3/9 "A preprocessing directive of the form `# define identifier replacement-list new-line` defines an object-like macro that causes each *subsequent instance* of the macro name to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is *then* rescanned for more macro names..." The word "then" (my emphasis) clearly states that the rescan is *after* the replacement of a subsequent instance of the name. Also, 6.10.3.5/1 "A macro definition lasts... until a corresponding #undef directive" – rici Aug 21 '15 at 20:06
  • @rici: I see nothing in that paragraph that disambiguates this case. I think the key is that no replacement occurs _during_ definition, but I can't see that in the text here. – Lightness Races in Orbit Aug 21 '15 at 21:33
  • 8
    @lightness: that would be 6.10 para 7: "preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise stated." – rici Aug 21 '15 at 22:03
  • 2
    @rici: Okay yep I think between 6.10.3/9 and 6.10/7 that covers it then. :) It would be great to get that in this answer so that it's not just an assertion. – Lightness Races in Orbit Aug 21 '15 at 22:56
  • @LightnessRacesinOrbit: given the velocity of upvotes of this answer, it seems that the vox populi doesn't care that it is just an assertion. :) Nonetheless, I added a language-lawyer-style answer, complete with standards quotes. – rici Aug 22 '15 at 02:33
  • @rici The vox populi were convinced the earth was flat and didn't bother asking for evidence of it. – Lightness Races in Orbit Aug 22 '15 at 11:29
  • @LightnessRacesinOrbit: and when presented with evidence to the contrary, they rose up in opposition. I fear that despite the Enlightenment and the Scientific Revolution, we continue to live in an uncomfortable stew of irrational prejudices, petty obsessions, and wilful blindness to objective reality. (And that's with respect to engineering. When it comes to economics or politics, it goes off the scale.) – rici Aug 22 '15 at 18:00
  • @rici: Right, so let's post some evidence. – Lightness Races in Orbit Aug 23 '15 at 10:23
59

In the interests of collecting all the relevant specifications from the standards, I extracted this information from a comment thread, and added C++ section numbers, based on draft N4527 (the normative text is identical in the two standards). The standard(s) are absolutely clear on the subject.

  1. #define preprocessor directives do not undergo macro replacement.

    (C11 §6.10¶7; C++ §16[cpp] ¶6): The preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise stated.

  2. After a macro is replaced with its replacement text, the new text is rescanned. Preprocessor tokens in the replacement are expanded as macros if there is an active macro definition for the token at that point in the program.

    (C11 §6.10.3¶9; C++ §16.3[cpp.replace] ¶9) A preprocessing directive of the form

    # define identifier replacement-list new-line

    defines an object-like macro that causes each subsequent instance of the macro name to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.

  3. A macro definition is active from the line following the #define until an #undef for the macro name, or the end of the file.

    (C11 §6.10.3.5¶1; C++ §16.3.5[cpp.scope] ¶1) A macro definition lasts (independent of block structure) until a corresponding #undef directive is encountered or (if none is encountered) until the end of the preprocessing translation unit. Macro definitions have no significance after translation phase 4.

If we look at the program:

#define NUM 10
#define FOO NUM
#undef NUM
#define NUM 20
FOO 

we see that the macro definition of NUM in line 1 lasts exactly to line 3. There is no replaceable text in those lines, so the definition is never used; consequently, the program is effectively the same as:

#define FOO NUM
#define NUM 20
FOO 

In this program, at the third line, there is an active definition for FOO, with replacement list NUM, and for NUM, with replacement list 20. The FOO is replaced with its replacement list, making it NUM, and then that is once again scanned for macros, resulting in NUM being replaced with its replacement list 20. That replacement is again rescanned, but there are no defined macros, so the end result is that the token 20 is left for processing in translation phase 5.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Great answer with rigorous explanations. But I still don't understand the need for the "unless otherwise stated" in [cpp]/6. Could you elaborate on this? – Belloc Mar 19 '20 at 21:24
  • @belloc: the standard specifies macro expansions in `#if` and `#elif` directives and certain `#include` directives. So it can't say that expansion is never performed in pp directives. What that clause says is that expansion is not performed in any directive except where explicitly indicated. – rici Mar 19 '20 at 21:51
20

In:

FOO

the preprocessor will replace it with NUM, then it will replace NUM with what it is currently defined as, which is 20.

Those initial four lines are equivalent to:

#define FOO NUM 
#define NUM 20
Shoe
  • 74,840
  • 36
  • 166
  • 272
14

The C11 standard says (and other versions of C, and C++, say similarly):

A preprocessing directive of the form # define identifier replacement-list new-line defines an object-like macro that causes each subsequent instance of the macro name to be replaced by the replacement list of preprocessing tokens that constitute the remainder of the directive. The replacement list is then rescanned for more macro names as specified below.

However it also says in another part (thanks to rici for pointing this out).

The preprocessing tokens within a preprocessing directive are not subject to macro expansion unless otherwise stated.

So a subsequent instance of the macro name which is found inside another #define directive is actually not replaced.

Your line #define FOO NUM defines that when the token FOO is later found (outside of another #define directive!), it will be replaced by the token NUM .

After a token is replaced, rescanning occurs, and if NUM is itself a macro, then NUM is replaced at that point. (And if whatever NUM expands to contains macros , then that gets expanded , and so on).

So your sequence of steps is actually:

  1. NUM defined as 10
  2. FOO defined as NUM
  3. NUM undefined and re-defined as 20
  4. FOO expands to NUM
  5. (rescan) NUM expands to 20

This behaviour can be seen in another common preprocessor trick, to turn the defined value of a macro into a string:

#define STR(X) #X
#define STR_MACRO(X) STR(X)
#define NUM 10

puts( STR_MACRO(NUM) );     // output: 10

If we had written puts( STR(NUM) ) then the output would be NUM.

The output of 10 is possible because, as before, the second #define here does not actually expand out STR. So the sequence of steps in this code is:

  1. STR(X) defined as #X
  2. STR_MACRO(X) defined as STR(X)
  3. NUM defined as 10
  4. STR_MACRO and NUM are both expanded; the result is puts( STR(10) );
  5. (Rescan result of last expansion) STR(10) is expanded to "10"
  6. (Rescan result of last expansion) No further expansion possible.
M.M
  • 138,810
  • 21
  • 208
  • 365