Understanding macros in C

Question

Why is the output from the following code the value 5?

#include<stdio.h>

#define A -B
#define B -C
#define C 5

int main()
{
  printf("The value of A is %d\n", A);
  return 0;
}

What did you get when you tried it? Why didn't you try it? What is puzzling you about the output? — Jonathan Leffler, Jul 05 '17 at 19:36
Hmm... it compiles with clang on macOS and the output says `5`. — axiac, Jul 05 '17 at 19:37
Remember, the input is already tokenized before the preprocessor processes macro expansion. Therefore, the code is as if you'd written `- - 5` rather than `--5` which would be an error (you can't decrement a constant). — Jonathan Leffler, Jul 05 '17 at 19:37
@EOF it's perfectly legal. The illegal thing is to try to force the preprocessor to perform 2 passes by trying to `#define` macro which performs a `#define`. Not the case here — Jean-François Fabre, Jul 05 '17 at 19:39
@Marged: I don't agree: it has nothing to do with lazyness, compiling source code does not prove much, explaining where the extra space comes from is the core of the trick question. Telling the interviewer about token pasting and compiler bugs would be a fantastic answer from an applicant. — chqrlie, Jul 05 '17 at 19:47
methinks that this is a great question :) as @chqrlie noted, sole compiling doesn't help much to understand why this works. — Jean-François Fabre, Jul 05 '17 at 19:49
@JonathanLeffler your 27 comments on that Q&A - gathered - could be great material for an answer, since no answer explained the tokenizing stuff (chqrlie's answer does, yes). thanks for the edit BTW. — Jean-François Fabre, Jul 05 '17 at 19:51
The question [How to concatenate twice with the C preprocessor and expand a macro as in `arg ## _ ## MACRO`](https://stackoverflow.com/questions/1489932/) has relevant information, but isn't by any stretch of the imagination a duplicate. The question [Can we write a macro over many lines without using a backslash at the end?](https://stackoverflow.com/questions/41710558) also has relevant information about stages 1-4 of the processing (stage 3 tokenizes; stage 4 preprocesses). — Jonathan Leffler, Jul 05 '17 at 20:52
MSVC compiler dev here. Our current preprocessor is (mostly) string based, and as a side effect it will concatenate in places where it should not. I am in the process of rewriting the preprocessor to correctly handle pp-tokens. It is a tricky problem to solve because there is a significant amount of legacy code that relies on this behavior, so I am not sure when the conformant preprocessor will be complete. — Rastaban, Jul 11 '17 at 18:28

score 8 · Answer 1 · edited Jul 06 '17 at 02:50

This is a tricky question because it is a stress test for the compiler preprocessor.

Depending if the preprocessor is an integrated phase of the compiler or a separate program passing its output to the compiler via a file or a pipe and in this case whether it is careful enough to not perform erroneous token pasting, you may get the expected output: 5 or you may get a compilation error.

After the preprocessed contents of stdio.h, the source code expands to:

int main()
{
  printf("The value of A is %d\n", --5);
  return 0;
}

But the two - are separate tokens, so depending if the preprocessor separates them in its output or not, you may get a program that outputs 5 or one that does not compile because -- cannot be applied to a literal 5.

Both the gcc and the clang preprocessors behave correctly and separate the - with an extra space to prevent token pasting when they produce the preprocessor output with the -E command line option. They output this as preprocessed source code after the expansion of <stdio.h>:

int main()
{
  printf("The value of A is %d\n", - -5);
  return 0;
}

Try your own compiler to check how it expands the source code. It seems Visual Studio 2013 and 2015 fail the test and reject the program with an error.

To makes things clear, I do not say the behavior of the program should depend on the compiler architecture. I was hoping at least one common C compiler would mishandle this conformance test. I am not surprised MS Visual Studio 2013 and 2015 fail this test.

The extra space in only needed in the textual output of the preprocessor. It does not matter if Visual Studio uses multiple separate phases or not, the source program is perfectly valid and their failure to compile it is a BUG.

Where the white space come from? If clang and gcc are right, why they don't expand `B` to `- 5`? — axiac, Jul 05 '17 at 19:45
@axiac: they don't need to expand to `- 5` because `-5` is correct. — Jean-François Fabre, Jul 05 '17 at 19:47
The standard says what must happen (and GCC and Clang implement it correctly); some compilers may have bugs, perhaps because they use a separate preprocessor program. The standard doesn't say "you can get different behaviours depending on whether the preprocessor is a separate program or not". — Jonathan Leffler, Jul 05 '17 at 19:51
@axiac, the space is synthesized when the preprocessor emits its results as text, instead of as a sequence of tokens passed directly to the compiler. This allows it to ensure that if the output is re-read as C source, it will represent the same token sequence that the original source did. This is a quality of implementation consideration, because the standard says nothing about such transformations back to text. — John Bollinger, Jul 05 '17 at 19:52
@JonathanLeffler: I completely agree. This code is a confirmance test for the preprocessor and VS fails it. I clarified the answer to this respect. — chqrlie, Jul 05 '17 at 19:58
Really interesting. I'd never considered this distinction before! — Brett Hale, Jul 16 '17 at 09:58

score 5 · Answer 2 · edited Jul 05 '17 at 19:50

No need to compile this code, just use gcc -E on it (preprocessor) and see what happens:

<lots of output expanding stdio.h> ...

int main()
{
  printf("The value of A is %d\n", - -5);
  return 0;
}

Obviously the result is 5 (which could have been guessed by looking at the nested macros, but a small preprocessor test doesn't hurt).

(Other answers noted that some compilers may handle the preprocessing of the minus signs which would result in a compiler error. gcc handles that nicely.)

score 3 · Answer 3 · answered Jul 05 '17 at 19:40

3

Question doesn't really make sense, but I still decided to give it a go.

Visual Studio 2013 and 2015: error C2105: '--' needs l-value

Reason is that the following line:

printf("The value of A is %d\n", A);

is first translated into (A becomes -B):

printf("The value of A is %d\n", -B);

then into (B becomes -C);

printf("The value of A is %d\n", --C);

and then into (C becomes 5):

printf("The value of A is %d\n", --5);

And since 5 is not an l-value, you cannot decrement it, hence the error. Seems quite logical, knowing the preprocessor will just do a simple string replace.

answered Jul 05 '17 at 19:40

Patrick

23,217
12
67
130

6

Congratulations. You can file a bug report with Microsoft. That is incorrect behaviour by their compiler. – Jonathan Leffler Jul 05 '17 at 19:42
@JonathanLeffler why is this a bug? clang expands `B` to `-5` and `A` to `- -5`. Where the space come from? And, if clang and gcc are right, why they don't expand `B` to `- 5`? – axiac Jul 05 '17 at 19:44
@axiac: the space is the char that saves the compilation, by creating a double minus instead of a pre-decrementation. – Jean-François Fabre Jul 05 '17 at 19:46
2

The input to the preprocessor is tokenized before macros are expanded. Tokens don't get untokenized. The `-` from `-B` is one token; it is not subject to further macro expansion. The next token is `B`; it is subject to macro expansion as `-C` (that's two tokens). The `-` is again not subject to macro expansion; it proceeds down the line. The `C` is not subject to macro expansion; it passes unscathed. The output of the preprocessor is 3 tokens — `-`, `-` and `C`. Conflating them as `--C` and then interpreting the two dashes as a decrement is incorrect. The standard is remarkably clear. – Jonathan Leffler Jul 05 '17 at 19:47
@Jean-FrançoisFabre I see that. My question is why the preprocessor adds a white space when it expands `A` but it doesn't add it when it expands `B`. It doesn't seem even to me, as the space is not there in the source code. – axiac Jul 05 '17 at 19:48
@axiac see Johnathan Leffer excellent comments. The compilers are just complying to the norm (well, some ... :)) – Jean-François Fabre Jul 05 '17 at 19:49
2

The extra space in only needed in the textual output of the preprocessor. It does not matter if VS uses multiple separate phases or not, the source program is perfectly valid and their failure to compile it is a BUG. – chqrlie Jul 05 '17 at 19:53
1

@JonathanLeffler If I get you right, it is not possible to `#define A str`, `#define B uct` and then use `A` and `B` to generate the keyword `struct`, isn't it? No matter what, it will be parsed as two tokens, right? (`str` and `uct`) – axiac Jul 05 '17 at 19:56
3

As shown, that's correct. There is the `##` token pasting operator, of course: `#define C(x, y) C1(x, y)` plus `#define C1(x, y) x ## y` which can be used in `C(A, B) apoplexy { … };` etc to cause apoplexy in the unfortunate readers of the code (not to mention the compiler writer). The two levels of macro are needed to get the arguments expanded to `str` and `uct`. Token pasting only works with 'wordy' tokens, not punctuation. – Jonathan Leffler Jul 05 '17 at 19:58
@JonathanLeffler Thank you. I learned something today. – axiac Jul 05 '17 at 20:08

score 0 · Answer 4 · answered Jul 05 '17 at 20:45

0

This an excellent example how do not use the preprocessor. To avoid confusions parenthesis should be used (not only in this case)

#define A (-B)
#define B (-C)
#define C (5)

answered Jul 05 '17 at 20:45

0___________

60,014
4
34
74

This is a valid observation about how to use the preprocessor; it isn't directly an answer to the 'why' question. So, +1 for validity and -1 for relevance. – Jonathan Leffler Jul 05 '17 at 21:01
Yes but the initial question is unanswerable. It is an example if the even not the Undefined Behaviour as UB is when program actually compiles but the result is unpredictable in the C standard meaning, but in this case compilation process in undefined. So my another answer is compiler error if the preprocessor does not add any whitespace or 5 if it does. – 0___________ Jul 05 '17 at 21:13
2

No; it is perfectly defined code with a valid answer — though there is at least one major compiler that mishandles the valid C code. There is no UB in the code. (I'm not claiming it is good code; it isn't. And using your macros instead of what's in the question would prevent broken compilers from breaking on the valid code. Those are tangential issues, though.) – Jonathan Leffler Jul 05 '17 at 21:14
The parentheses in `#define C (5)` seem unnecessary. is what context do you think they are required? – chqrlie Jul 05 '17 at 21:40
To avoid C standard maniacs (I am not one of them) discussion, general rule (IMHO) is: _*It is better to write to many brackets than to have one pair missing*_ :) – 0___________ Jul 05 '17 at 21:40
Did you respond to my question in less than 4 seconds? extra parentheses do not hurt, I am just asking if you thought of something I did not. – chqrlie Jul 05 '17 at 21:42
@chqrlie No I did not. I just do not understand this minimalistic approach. – 0___________ Jul 05 '17 at 21:49
OK, so there is no misunderstanding. – chqrlie Jul 05 '17 at 22:02

score 0 · Answer 5 · answered Jul 05 '17 at 22:47

each #define preprocessing directive will insert in the environment of the preprocessor a variable assigned to a value made of a list of preprocessing directives.

{A -> -B; B->-C; C->5}

is the environment in the moment when A is evaluated. Now, making the evaluation process of A, we have

  A  ->  -B   (the identifier `A` is transformed in the stream of preprocessing tokens `-B`)
 -B  ->  --C
--C  ->  --5
->  5

and this one will not be evaluated any more by the Prosser's algorithm, as it has no more identifiers.

So, reducing,

 A->5

the stream A is converted in the stream 5 and this one will be converted from preprocessing tokens in C-tokens and sent further to the C compiler.

Understanding macros in C

5 Answers5

Linked