2

When I was looking for the expression v[i++]=i; why it is to define the behavior, I suddenly saw an explanation because the expression exists between two sequence points in the program, and the c standard stipulates that in the two sequence points The order of occurrence of the side effects is uncertain, so when the expression is run in the program, it is not sure whether the ++ operator is operated first or the = operator is operated first. I am puzzled by this. When the expression is evaluated In the process, shouldn't the priority be used to judge first, and then the sequence point should be introduced to judge which sub-expression is executed first? Am I missing something?

When user AnT stands with Russia explained it like this, does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
三六九
  • 21
  • 3
  • 1
    This is why you always write your increment and your assignment operations in separate lines of code. That way, you and the programmer coming after you won't have to undertake these mental gymnastics. – Robert Harvey Jan 13 '23 at 13:32
  • 6
    Since both `a[i]=y++` and `a[i++]=y` are using different variables on both sides of the assignment, there's no problems. It's all well-defined. – Some programmer dude Jan 13 '23 at 13:34
  • @Someprogrammerdude: We need to know the value of `a` before asserting those are well defined. `a[i] = y++` does not modify different objects if `a` points to `y` and `i` is zero, and `a[i++] = y` does not modify different objects if `a` points to `i` and `i` is zero. – Eric Postpischil Jan 13 '23 at 14:04

5 Answers5

5

The reason v[i++]=i; is undefined behavior is because the variable i is both read and written in the same expression without sequencing.

Expressions such as a[i]=y++ and a[i++]=y do not exhibit undefined behavior because no variable is both read and written in the expression without sequencing.

The = operator does however ensure that both of its operands are fully evaluated before the side effect of assigning to the left side. Specifically, a[i] is evaluated to be an lvalue designating the ith element of the array a, and y++ is evaluated to be the current value of y.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • "because the variable i is both read and written in the same expression" Err well the same could be said about `i++`. – Lundin Jan 13 '23 at 13:37
  • 1
    The standardese would rather be "because there is a side effect on an object which is unsequenced in relation to a value computation of the same object". Which admittedly is a wording that makes nobody wiser after reading it... – Lundin Jan 13 '23 at 13:39
  • @Lundin Maybe "read, written, **and modified** in the same expression without sequencing"? – Andrew Henle Jan 13 '23 at 13:48
  • @AndrewHenle That's what `i++` does, yeah? :) – Lundin Jan 13 '23 at 14:00
  • @三六九 The only guarantee is that the side effect of the increment happens by the next sequence point. – dbush Jan 14 '23 at 04:11
  • @dbush That means priority and associativity don't matter, right? – 三六九 Jan 14 '23 at 05:19
  • @dbush So is it an expression like y = x+j*n;. When executing the right operand, + and * cannot guarantee who will operate first? – 三六九 Jan 14 '23 at 13:41
  • @三六九 The `*` has higher precedence so it is computed before the `+`. In the case of `++` it's the side effect that doesn't have guarantees about sequencing with other operators. – dbush Jan 14 '23 at 15:18
  • @dbush So when encountering an expression without side effects, can you directly use the priority to determine the order of operations of the operator? – 三六九 Jan 15 '23 at 00:46
3

The specific rule in the C standard is C 2018 6.5 2:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

The first sentence is the critical one here. First, consider v[i] = i++;. Here, the i in v[i] computes the value of i, and the i++ both computes the value of i and increments the stored value of i. Computing the value of i is a value computation of i. Incrementing the stored value of i is a side effect. To determine whether the behavior of v[i] = i++; is undefined, we ask whether the side effect is unsequenced relative to any other side effect on i or to a value computation on i.

There is no other side effect on i, so it is not unsequenced relative to any other side effect.

There is a value computation in i++, but the side effect and this value computation are sequenced by the specification of the postfix ++ operator. C 2018 6.5.2.4 2 says:

… The value computation of the result is sequenced before the side effect of updating the stored value of the operand…

So we know the computation of the value of i in i++ is sequenced before the side effect of incrementing the stored value.

Now we consider the value computation of the i in v[i]. The ++ specification does not tell us about this, so let’s consider the assignment operator, =. The specification of assignment does say something about sequencing, in C 2018 6.5.16 3:

… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

The first sentence tells us the update of v[i] is sequenced after the value computations of the left and right operands. But it does not tell us anything about the side effect in ++ relative to the value computation of i in v[i].

Therefore, the value computation of i in v[i] is unsequenced relative to the side effect on i in i++, so the behavior of the statement is not defined by the C standard.

In a[i] = y++; we have:

  • A value computation on i in a[i].
  • A value computation on y in y++.
  • An update of the stored value of y in y++.
  • A value computation on a in a[i].
  • An update of the stored value of a[i] in a[i] = ….

The only object that is updated twice or that is both updated and evaluated is y, and we know from above that the value computation on y in y++ is sequenced before the update of y. So this statement does not contain any side effect that is unsequenced relative to another side effect or value cmputation on the same object. So its behavior is not undefined by the rule in C 2018 6.5 2.

Similarly, in a[i++] = y;, we have:

  • A value computation on i in a[i++].
  • An update of the stored value of i in i++.
  • A value computation on y.
  • A value computation on a in a[i].
  • An update of the stored value of a[i] in a[i++] = ….

Again, there is only one object with two operations on it, and those operations are sequenced. The behavior is not undefined by the rule in C 2018 6.5 2.

Note

In the above, we assume neither a nor v is a pointer such that a[i] or v[i] would be i or y. If instead we consider this code:

int y = 3;
int *a = &y;
int i = 0;
a[i] = y++;

Then the behavior is undefined because a[i] is y, so the code updates y twice, once for the assignment a[i] = … and once for y++, and these updates are unsequenced. The specification of assignment says the update to the left operand is sequenced after the value computation of the result (which is the value of the right side of the assignment), but the increment for ++ is a side effect, not part of the value computation. So the two updates are unsequenced, and the behavior is not defined by the C standard.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
3

An attempt to explain the "standardese" terms plainly:

The standard says (C17 6.5) that in an expression, a side effect of a variable may not occur in an unsequenced order in relation to a value computation of that same object.

To make sense of these strange terms:

  • Side effect = writing to a variable or perform a read or write access to a volatile variable.
  • Value computation = reading the value from memory.
  • Unsequenced = The order between accesses/evaulations is not specified nor well-defined. C has the concept of sequence points, which are certain points in the program that when reached, previous side effects must have been evaluated. For example, a ; introduces a sequence point. Two parts of an expression are unsequenced in relation to each other when the order of evaluation of each part is not well-defined before the next sequence point. (A complete list of all sequence points can be found in C17 Annex C.)

So when translated from standardese to English, v[i++]=i; has undefined behavior since i is written to in an unspecified order related to the other read of i in the same expression. How do we know that?

  • The assignment operator = says that (6.5.16) "the evaluations of the operands are unsequenced", refering to the left and right operands of =.
  • The postfix ++ operator says that (6.5.2.4) "As a side effect, the value of the operand object is incremented" and "The value computation of the result is sequenced before the side effect of updating the stored value of the operand". In practice meaning that i is first read and the ++ is applied later, though before the next sequence point, in this case the ;.

In case of a[i]=y++; or a[i++]=y; everything happens on different variables. There are two side effects, updating i (or y) and updating a[i] but they are done on different objects, so both examples are well-defined.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • "Value computation" means determining the value that a (sub-)expression represents at some point during the execution of a program. This often involves reading from memory (or at least from a register), but it is poorly defined *as* reading a value from memory. In fact, the spec has a specific, completely different term for reading the value of an object from memory: "lvalue conversion". – John Bollinger Jan 13 '23 at 14:30
  • @JohnBollinger There is unfortunately no formal definition of "value computation" even though C11 keeps using the term all over the place. The whole rewrite of this from C99 to C11 was one big fiasco IMO, which didn't change anything concrete outside the context of multi-processing, but made everything a whole lot harder to read. – Lundin Jan 13 '23 at 14:41
  • Indeed the spec provides no formal definition of "value computation". Therefore, we should attempt to understand the meaning of that term according to ordinary English. I submit that the definition I present in my previous comment matches that pretty well and is consistent with all of the specs' usage of the term. The meaning you give does not fit as well, **and** the spec has a different, explicitly defined, term with that meaning. – John Bollinger Jan 13 '23 at 14:48
  • @JohnBollinger Fine, I'm not going to argue. Since the intention of this answer isn't to write an explanation that will sate language-lawyers with in-depth knowledge of C but to write an explanation that the average C programmer might understand. Simplifications may be present. – Lundin Jan 13 '23 at 14:55
  • @JohnBollinger: Many aspects of the Standard could be vastly simplified if it defined the concept of "resolving" an lvalue to yield a "reference". The code `int *p = &structArray[i++].someMember;`, the lvalue `structArray[i++]` isn't evaluated, but code is clearly doing *something* with it. Many difficulties involving type-based aliasing could be eliminated if the sequence of steps "resolve an lvalue to form a pointer; access storage at pointer" were treated as an operation which was, in general, indeterminately sequenced with regard to anything that happened between those two steps. – supercat Jan 14 '23 at 20:06
  • It sort of does define that concept, @supercat. Paragraph 5.1.2.3/2 says, in part, "Value computation for an lvalue expression includes determining the identity of the designated object." And that's the (whole) answer to what a program does with an lvalue expression that is not subject to lvalue conversion. The identity of an object uniquely determines its address, so there's your reference. What the spec does not say is anything about the form taken by the identities so determined. I think the question of type-based aliasing is a somewhat separate one. – John Bollinger Jan 15 '23 at 15:07
  • @JohnBollinger: It uses the phrase "determine the identity" but doesn't define a term for it in a manner that can be incorporated into other aspects, such as saying that when evaluating `arr[i] += 2;`, the evaluation of `i` is sequenced before the resolution of `arr[i]`, which in turn precedes any access to the storage identified thereby. As for aliasing, given `foo(&someUnion.member); bar(&someUnion.otherMember);`, recognizing that the act of resolving `someUnion.member` to a pointer and using that pointer to access the storage identified thereby are generally unsequenced with regard to... – supercat Jan 15 '23 at 17:54
  • ...other actions would eliminate the need for compilers to allow for the possibility that during the execution of `foo`, `someUnion.member` might be accessed in conflicting fashion via some seemingly-unrelated pointer of that member's type, since any such access would be unsequenced relative to accesses made via the passed-in pointer. Many situations where actions are classified as UB to facilitate optimization could be better described in terms of sequencing implications, especially if one recognizes categories of implementations where e.g. performing unsequenced writes to a location may... – supercat Jan 15 '23 at 17:59
  • ...cause reads to behave in non-deterministic fashion, which may cause unbounded UB if the values are used e.g. in array subscripts, but would have no effect if the values are never used in any manner that could affect program execution. – supercat Jan 15 '23 at 18:02
0

The C standard (C11 draft) says the following about the postfix ++ operator:

(6.5.2.4.2) The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). [...]

A sequence point is defined by a point in the code where it is guaranteed that all side effects before the point have taken effect and no side effects after the point have taken effect.

There is no intermediate sequence points in the expression v[i++] = i;. Thus it is not defined whether the side effect of the expression i++ (incrementing i) takes effect before or after the right-hand side i is evaluated. Thus it is the value of the right-hand side i which is not defined in this expression.

This problem does not exist in the expression a[i++] = y; because the value of the right-hand side y is not affected by the side effect of i++.

nielsen
  • 5,641
  • 10
  • 27
0

When the expression is evaluated In the process

Which expression?

v[i++]=i;

is a statement. It consists of a toplevel assignment expression a = b, where a and b are both themselves expressions.

The left-hand expression a is itself of the form c[d], where d is another subexpression of the form d ++ and d is yet another expression, finally resolved to i.

If it helps we can write the whole thing out in pseudo-function-call style, like

assign(array_index(v, increment_and_return_old_value(i)), i);

Now, the problem is that the standard doesn't tell us whether the final value parameter i is obtained before or after i is mutated by increment_and_return_old_value(i) (or by i++).

... and then the sequence point should be introduced to judge which sub-expression is executed first?

The , in a function call parameter list isn't a sequence point. The relative order in which function parameters are evaluated is not defined (only that they must all have been evaluated before the function body is entered).

The same logic applies to the original code - the standard says there is no sequence point, so there is no sequence point.


does it mean that writing in the code such as a[i]=y++; or a[i++]=y; in the program can not be sure ++ operator and = operator can not determine who runs first.

It's not the assignment that is the problem, it is evaluating the right-hand operand to be assigned.

And, in these cases, there is no relationship between left-hand side thing being assigned to and the right-hand side value being assigned. So although we still cannot be sure which is evaluated first, it doesn't matter.

If I wrote out explicitly

int *lhs = &a[i];
int rhs = y++;
*lhs = rhs;

then reversing the first two lines would make no difference. Their relative order doesn't matter, so the lack of a defined relative order doesn't matter.

Conversely, for completeness,

int *lhs = v[i++];
int rhs = i;
*lhs = rhs;

is the original case where the order of the first two lines does matter, and the fact that it is unspecified is a problem.

Useless
  • 64,155
  • 6
  • 88
  • 132