1

The comma sequence operator introduces a sequence point in an expression. I am wondering whether this means that the program below avoids undefined behavior.

int x, y;

int main()
{
  return (x++, y) + (y++, x);
}

If it does avoid undefined behavior, it could still be unspecified, that is, return one of several possible values. I would think that in C99, it can only compute 1, but actually, various versions of GCC compile this program into an executable that returns 2. Clang generates an executable that returns 1, apparently agreeing with my intuition.

Lastly, is this something that changed in C11?

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • It's still undefined because the two operands of `+` are evaluated in parallel, so to speak. – melpomene Dec 18 '12 at 15:15
  • @melpomene How does the “the two operand of `+` are evaluated in parallel” explanation not apply to `f() + g()` (which the consensus is that it is not undefined http://stackoverflow.com/questions/3951017/in-c99-is-fg-undefined-or-merely-unspecified )? – Pascal Cuoq Dec 18 '12 at 15:18
  • I'm not sure how to interpret this. "If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined." – jn1kk Dec 18 '12 at 15:22
  • @jsn I have never been quite sure what his sentence was about, but I thought that it was concerned with expressions whose value is an allocated object, e.g. `f()` when `f()` returns a `struct`. For what it's worth, GCC compiles `(x++, y+1) + (y++, x+1)` into `4`, and surely that cannot be excused by this sentence. – Pascal Cuoq Dec 18 '12 at 15:30
  • 3
    @PascalCuoq The consensus there is that the order is unspecified, not undefined because there's a sequence point before each function call and function calls count as side effects, i.e. function executions don't interleave with each other. – melpomene Dec 18 '12 at 15:40
  • 1
    The question is, in `e1 + e2`, are the evaluations of `e1` and `e2` unsequenced or indeterminately sequenced? If unsequenced, it's undefined behaviour, if indeterminately sequenced, merely unspecified. – Daniel Fischer Dec 18 '12 at 15:46
  • @melpomene But how are the sequence points before function calls different from the sequence points in, say, `(0, x++, y+1) + (0, y++, x+1)`? (which GCC still evaluates as 4) – Pascal Cuoq Dec 18 '12 at 15:47

4 Answers4

6

Take the expression:

(x++, y) + (y++, x)

Evaluate left-to-right:

x++  // yield 0, schedule increment of x
,    // sequence point: x definitely incremented now
y    // yield 0
y++  // yield 0, schedule increment of y
// explode because you just read from y and wrote to y
// with no intervening sequence point

There's nothing in the standard that forbids this, so the whole thing has undefined behavior.

Contrast this pseudocode:

f() { return x++, y; }
g() { return y++, x; }
f() + g()

Acoording to C99 (5.1.2.3/2) the calls to f and g themselves count as side effects, and the function call operator contains a sequence point just before it enters a function. This means function executions can't interleave.

Under the "evaluate things in parallel" model:

f()  // arbitrarily start with f: sequence point; enter f
g()  // at the same time, start calling g: sequence point

Since the execution of f counts as a side effect itself, the sequence point in g() suspends execution until f has returned. Thus, no undefined behavior.

melpomene
  • 84,125
  • 8
  • 85
  • 148
5

The whole chapter 6.5 of the standard mentions the order of evaluation, on operator basis. The best summary of the order of evaluation I could found was the (non-normative) Annex J of the standard:

C11 Annex J

J.1 Unspecified behavior

  • The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ?:, and comma operators (6.5)

In your example, you cannot know whether the sub-expression (x++, y) or (y++, x) is evaluated first, since the order of evaluation of the operands of the + operator is unspecified.

As for undefined behavior, the comma operator solved nothing. If (x++, y) is evaluated first, then y may get evaluated immediately before y++ of the other sub-expression. Since y is accessed twice without a sequence point in between, for other purposes than to determine the value to store, the behavior is undefined. More info here.

So your program has undefined and unspecified behavior both.

(In addition, it has implementation-defined behavior since int main(), rather than int main (void) is not one of the well-defined versions of main in a hosted application.)

Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • If your remark is that `y` may get scheduled next to `y++`, then let us consider the expression `(0, x++, y+1) + (0, y++, x+1)`. GCC still evaluates this to `4`. – Pascal Cuoq Dec 18 '12 at 21:15
  • @PascalCuoq I don't think that expression is UB, though you still have an unspecified order of evaluation of the sub-expressions. You have 4 sub-expressions here, nested into each other. The expression is equivalent to `((0, x++), y+1) + ((0, y++), x+1)` where each parenthesis is a sub-expression. I suppose GCC may evaluate it like this: `0, x++` = 1, `0, y++`= 1, `(1, 1+1) + (1, 1+1)` = `2 + 2`= 4. – Lundin Dec 19 '12 at 07:28
1

My understanding is that any of the following evaluation orders are legal:

  • x++, y++, y, x
  • x++, y, y++, x
  • x++, y++, x, y
  • y++, x, x++, y
  • ...

But not:

  • y, x++, x, y++
  • ...

In my understanding, the only ordering that is required is that (x++) must come before (y), and (y++) must come before (x).

Therefore, I believe that this is still undefined behavior.

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173
  • If your explanation is that the compiler can choose one of several orders for computations, then it is an explanation of why my program is **unspecified behavior**. **Undefined behavior** on the other hand allows the compiler to do anything (nasal demons), like when the program contravenes 6.5:2 (“Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression”). Which I am not sure my program does. – Pascal Cuoq Dec 18 '12 at 15:42
  • The listed orders of evaluation are correct, but the 5th one is not allowed because of operator precedence, rather than the order of evaluation of sub-expressions. As part of the precedence rules of the standard, the comma operator in itself must always be evaluated from left to right. And no, neither of this proves that the code in the original post is undefined behavior. – Lundin Dec 19 '12 at 07:36
1

The standard explicitly declares which operators introduce sequence points, + is not among them. So your expression can well be evaluated in a way that the forbidden subexpressions are evaluated "close" to each other, without a sequence point in between them.

On a pragmatical base, there is a reason for this interdiction, namely that the evaluation of the operands of such an operation may be scheduled in parallel, e.g if the processor has several pipelines for the operation of the subexpressions in question. You can easily convince yourself that under such a model of evaluation, anything can happen, the result is not predictable.

In your toy example, the conflict is predictable, but this wouldn't necessarily be the case if you would use some pointer indirection in the subexpressions. Then you wouldn't know if one of the subexpressions would potentially alias the other. So basically not much that C can do if it wants to allow for this type of optimization.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • Everyone here, included me, agrees that the intention is to allow efficient compilation. I am only arguing that the C99 standard is not doing a good job of actually stating that an expression such as `(0, x++, y+1) + (0, y++, x+1)` is undefined. Clauses such as 6.5:2 or 6.5:3 do not explicitly say this expression is undefined. They do not use the words “in parallel”, and they place function calls and `,` at the same level. This may have changed in C11, but I am not familiar with it yet. – Pascal Cuoq Dec 18 '12 at 21:11
  • @PascalCuoq, C11 is a bit more precise on the model for sequencing operations. But the concept of *data race* is only introduced for thread parallelism, and it is not completely clear to me if the thread concept is relevant, here. The evaluation of the two subexpressions *could* be seen as two different threads of execution, but I don't know if this is pushing things too far. – Jens Gustedt Dec 18 '12 at 22:57