Why compiler treats i+++++i and i+++i differently

Question

int i=5;
printf("%d",i+++++i);

This gives error, but:

printf("%d",i+++i);

gives the output 11. In this case, the compiler read it as:

printf("%d",i+ ++i);

Why is this not done in first expression? i.e :

printf("%d",i+++++i);

This pattern recognition is called Maximum munch rule, Refer here to know more http://stackoverflow.com/questions/5341202/why-doesnt-ab-work-in-c — Deepthought, Nov 12 '13 at 07:34
The compiler is not allowed to read `i+++i` as `i+ ++i`, it must read it as `i++ + i` which causes undefined behaviour. — CB Bailey, Nov 12 '13 at 07:46
So in other words, the `i+++i` bug is even worse than the `i+++++i` bug, because the former bug will compile but might crash and burn, or give random results, while the latter bug will not pass compilation. Lesson learnt: do not write bugs. — Lundin, Nov 12 '13 at 07:54
@EdHeal : From level of his question he trying to grasp the basic concept. Why would anyone need this kind of thing in real scenarios ? — Anirudha Agashe, Nov 12 '13 at 08:47
@ishantsharma - I just worry that some people get into the mindset that this type of code is acceptable. Perhaps it might work as intended but is it readable. Me thinks not and therefore is hard (impossible?) to maintain — Ed Heal, Nov 12 '13 at 09:05

score 2 · Answer 1 · answered Nov 12 '13 at 07:27

2

Because of operator precedence i++++++i is treated as (i++)++ + i). This gives a compiler error because (i++) is not an lvalue.

answered Nov 12 '13 at 07:27

Klas Lindbäck

33,105
5
57
82

AnT stands with Russia · Answer 2 · 2013-11-12T08:10:45.213

i+++++i is parsed as i ++ ++ + i. It contains an invalid subexpression i ++ ++. Speaking formally, this expression contains a constraint violation, which is why it does not compile.

Meanwhile i+++i is parsed as i ++ + i (not as i + ++ i as you incorrectly believe). It does not contain any constraint violations. It produces undefined behavior, but is otherwise well-formed.

Also, it is rather naive to believe that printf("%d",i+++i) will print 11. The behavior of i+++i is undefined, meaning that there's no point in trying to predict the output.

score 1 · Answer 3 · answered Nov 12 '13 at 08:08

1

Modifying the same variable multiple times between two sequence points is an Undefined Behavior according to §6.5 of language specifications

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.(71)

71) This paragraph renders undefined statement expressions such as

i = ++i + 1;
a[i++] = i;

while allowing

i = i + 1;
a[i] = i;

answered Nov 12 '13 at 08:08

HAL

3,888
3
19
28

This answer does not explain how the source texts in the question are parsed. – Eric Postpischil Nov 12 '13 at 12:19
@EricPostpischil If the question is about `C` and `expression`, I feel that the answer is valid but If you look at it from `compiler` and `parsing` then you're right. – HAL Nov 12 '13 at 12:28

score 0 · Answer 4 · answered Nov 12 '13 at 15:06

In printf("%d",i+++++i);, the source text i+++++i is first processed according to this rule from C 2011 (N1570) 6.4 4:

If the input stream has been parsed into preprocessing tokens up to a given character, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token…

This causes the lexical analysis to proceed in this way:

i can be a token, but i+ cannot, so i is the next token. This leaves +++++i.
+ and ++ can each be a token, but +++ cannot. Since ++ is the longest sequence that could be a token, it is the next token. This leaves +++i.
For the same reason, ++ is the next token. This leaves +i.
+ can be a token, but +i cannot, so + is the next token. This leaves i.
i can be a token, but i) cannot, so i is the next token.

Thus, the expression is i ++ ++ + i.

Then the grammar rules structure this expression as ((i ++) ++) + i.

When i++ is evaluated, the result is just a value, not an lvalue. Since ++ cannot be applied to a value that is not an lvalue, (i ++) ++ is not allowed.

After the compiler recognizes that the expression is semantically incorrect, it cannot go back and change the lexical analysis. Th C standard specifies that the rules must be followed as described above.

In i+++i, the code violates a separate rule. This is parsed as (i ++) + i. This expression both modifies i (in i ++) and separately accesses it (in the i of + i). This violates C 2011 (1570) 6.5 2:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

This rule uses some technical terms: In i ++, the effect of changing i is a side effect of ++. (The main effect is to produce the value of i.) The use of i in + i is a value computation of the scalar object i. And these two things are unsequenced, because the C standard does not specify whether producing the value of i for + i comes before or after changing i in i ++.

Why compiler treats i+++++i and i+++i differently

4 Answers4