28

Consider the classical sequence point example:

i = i++;

The C and C++ standards state that the behavior of the above expression is undefined because the = operator is not associated with a sequence point.

What confuses me is that ++ has a higher precedence than = and so, the above expression, based on precedence, must evaluate i++ first and then do the assignment. Thus, if we start with i = 0, we should always end up with i = 0 (or i = 1, if the expression was i = ++i) and not undefined behavior. What am I missing?

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
Sami Hailu
  • 321
  • 2
  • 7
  • 2
    The question is not a duplicate. It asks the difference between sequence points and precedence that has not be answered in the post you mentioned. – Sami Hailu Jun 26 '17 at 23:38
  • 1
    This is not a duplicate: even though the setting for the question is the same, OP wants to know something completely different (namely, why isn't this problem resolved by compiler applying precedence rules). Voting to re-open. – Sergey Kalinichenko Jun 26 '17 at 23:38
  • 3
    The value of `i++` is the old value of `i` before incrementing it. So if we "evaluate `i++` first and then do the assignment", wouldn't you do "get value of i; increment i; set i to its old value" and end up with `i==0`? – aschepler Jun 26 '17 at 23:44
  • "precedence" does NOT mean "order of things happening". It's probably a bad word but we're stuck with it (in normal English usage the word comes from "precede" which implies a before/after relationship, but in C there is no such implication) – M.M Jun 27 '17 at 07:28
  • @coolguy You open-hammered wrongly! OP clearly confuses preceedence and sequence points and the dup **did** cover exactly this difference! Don't open-hammer without reading the dups! – too honest for this site Jun 27 '17 at 13:20
  • @dasblinkenlight OP clearly confuses preceedence and sequence points and the dup **did** cover exactly this difference! – too honest for this site Jun 27 '17 at 13:21
  • 1
    @Olaf The dupe does not mention "precedence" anywhere in the body of the question or in any of the answers. – Sergey Kalinichenko Jun 27 '17 at 13:36
  • @dasblinkenlight: If still is the same question. Anyway, there are various other dups. A simple google search for `c sequence points preceedence` shows up a lot of other dups, the first was https://stackoverflow.com/questions/5473107/operator-precedence-vs-order-of-evaluation but there are clearly better ones. – too honest for this site Jun 27 '17 at 13:55
  • Possible duplicate of [Operator Precedence vs Order of Evaluation](https://stackoverflow.com/questions/5473107/operator-precedence-vs-order-of-evaluation) – Justin Jun 28 '17 at 18:48

3 Answers3

31

All operators produce a result. In addition, some operators, such as assignment operator = and compound assignment operators (+=, ++, >>=, etc.) produce side effects. The distinction between results and side effects is at the heart of this question.

Operator precedence governs the order in which operators are applied to produce their results. For instance, precedence rules require that * goes before +, + goes before &, and so on.

However, operator precedence says nothing about applying side effects. This is where sequence points (sequenced before, sequenced after, etc.) come into play. They say that in order for an expression to be well-defined, the application of side effects to the same location in memory must be separated by a sequence point.

This rule is broken by i = i++, because both ++ and = apply their side effects to the same variable i. First, ++ goes, because it has higher precedence. It computes its value by taking i's original value prior to the increment. Then = goes, because it has lower precedence. Its result is also the original value of i.

The crucial thing that is missing here is a sequence points separating side effects of the two operators. This is what makes behavior undefined.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 3
    Worth pointing out that two side effects are not the only problem. A side effect ("write") and an evaluation ("read") of the same memory location without a sequence point between them is also undefined behavior. – aschepler Jun 27 '17 at 00:02
  • 1
    @aschepler But there is no read of a memory location "after" a write to the same memory location in the expression `i = i++`. The variable `i` is read once when `i++` is evaluated. Then there are the two conflicting writes. There is no other read operation. – j6t Jun 27 '17 at 05:26
  • @j6t Yes, I wasn't talking about the given example, just a point about sequence points and undefined behavior in general. – aschepler Jun 27 '17 at 11:06
  • 2
    @j6t Consider the expression `i + i++`. Again, undefined behavior, but this time with a read and a write racing for execution. Aschepler is totally right, this should be mentioned. – cmaster - reinstate monica Jun 27 '17 at 12:07
  • @aschepler: `i++` (which is `i += 1` which is `i = i + 1`) consists of a read and a write. And there is no sequence point between them. That contradicts your statement. Similar `a = i = j` might consist of a write to `i` plus a read of (`i` and `j`) or only `j`. – too honest for this site Jun 27 '17 at 13:57
  • With C11 and C++11, the formal definitions are more complicated than just sequence points (but with mostly the same results). The actual C11 wording is 6.5/2 "If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined." But the wording on all assignment operators says 6.5.16/3 "The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands." – aschepler Jun 27 '17 at 22:38
15

Operator precedence (and associativity) state the order in which an expression is parsed and executed. However, this says nothing about the order of evaluation of the operands, which is a different term. Example:

a() + b() * c()

Operator precedence dictates that the result of b() and the result of c() must be multiplied before added together with the result of a().

However, it says nothing about the order in which these functions should be executed. The order of evaluation of each operator specifies this. Most often, the order of evaluation is unspecified (unspecified behavior), meaning that the standard lets the compiler do it in any order it likes. The compiler need not document this order nor does it need to behave consistently. The reason for this is to give compilers more freedom in expression parsing, meaning faster compilation and possibly also faster code.

In the above example, I wrote a simple test program and my compiler executed the above functions in the order a(), b(), c(). The fact that the program needs to execute both b() and c() before it can multiply the results, doesn't mean that it must evaluate those operands in any given order.

This is where sequence points come in. It is a given point in the program where all previous evaluations (and operations) must be done. So sequence points are mostly related to order of evaluation and not so much operator precedence.

In the example above, the three operands are unsequenced in relation to each other, meaning that no sequence point dictates the order of evaluation.

Therefore it turns problematic when side effects are introduced in such unsequenced expressions. If we write i++ + i++ * i++, then we still don't know the order in which these operands are evaluated, so we can't determine what the result will be. This is because both + and * have unspecified/unsequenced order of evaluation.

Had we written i++ || i++ && i++, then the behavior would be well-defined, because the && and || specifies the order of evaluation to be left-to-right and there is a sequence point between the evaluation of the left and the right operand. Thus if(i++ || i++ && i++) is perfectly portable and safe (although unreadable) code.


As for the expression i = i++;, the problem here is that the = is defined as (6.5.16):

The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

This expression is actually close to be well-defined, because the text actually says that the left operand should not be updated before the right operand is computed. The problem is the very last sentence: the order of evaluation of the operands is unspecified/unsequenced.

And since the expression contains the side effect of i++, it invokes undefined behavior, since we can't know if the operand i or the operand i++ is evaluated first.

(There's more to it, since the standard also says that an operand should not be used twice in an expression for unrelated purposes, but that's another story.)

Lundin
  • 195,001
  • 40
  • 254
  • 396
1

Operator precedence and order of evaluation are two different things. Let's have a look at them one by one:

Operator precedence rule: In an expression operands bound tighter to the operators having higher precedence.

For example

int a = 5;
int b = 10;
int c = 2;
int d;

d = a + b * c;  

In the expression a + b * c, precedence of * is higher than that of + and therefore, b and c will bind to * and expression will be parsed as a + (b * c).

Order of evaluation rule: It describes how operands will be evaluated in an expression. In the statement

 d = a>5 ? a : ++a; 

a is guaranteed to be evaluated before evaluation of ++b or c.
But for the expression a + (b * c), though * has higher precedence than that of +, it is not guaranteed that a will be evaluated either before or after b or c and not even b and c ordered for their evaluation. Even a, b and c can evaluate in any order.

The simple rule is that: operator precedence is independent from order of evaluation and vice versa.

In the expression i = i++, higher precedence of ++ just tells the compiler to bind i with ++ operator and that's it. It says nothing about order of evaluation of the operands or which side effect (the one by = operator or one by ++) should take place first. Compiler is free to do anything.

Let's rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like

il = ir++     // Note that suffix l and r are used for the sake of clarity.
              // Both il and ir represents the same object.  

Now compiler is free to evaluate the expression il = ir++ either as

temp = ir;      // i = 0
ir = ir + 1;    // i = 1   side effect by ++ before assignment
il = temp;      // i = 0   result is 0  

or

temp = ir;      // i = 0
il = temp;      // i = 0   side effect by assignment before ++
ir = ir + 1;    // i = 1   result is 1  

resulting in two different results 0 and 1 which depends on the sequence of side effects by assignment and ++ and hence invokes UB.

haccks
  • 104,019
  • 25
  • 176
  • 264