5

I've been encountered on a case where cross-platform code was behaving differently on a basic assignment statement.

One compiler evaluated the Lvalue first, Rvalue second and then the assignment.

Another compiler did the Rvalue first, Lvalue second and then the assignment.

This may have impact in case Lvalue influence the value of Rvalue as shown in the following case:

struct MM {
    int m;
}
int helper (struct MM** ppmm ) { 
    (*ppmm) = (struct MM *) malloc (sizeof (struct MM)); 
    (*ppmm)->m = 1000;
    return 100;
}

int main() { 
    struct MM mm = {500};
    struct MM* pmm = &mm
    pmm->m = helper(&pmm);
    printf(" %d %d " , mm.m , pmm->m);
}

The example above, the line pmm->m = helper(&mm);, depend on the order of evaluation. if Lvalue evaluated first, than pmm->m is equivalent to mm.m, and if Rvalue calculated first than pmm->m is equivalent to the MM instance that allocated on heap.

My question is whether there's a C standard to determine the order of evaluation (didn't find any), or each compiler can choose what to do. are there any other similar pitfalls I should be aware of ?

Zohar81
  • 4,554
  • 5
  • 29
  • 82
  • 4
    It's undefined behavior since you modify the same variable twice in a single expression. Don't do that. See [this](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-undefined-behavior) and [this](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior) – user3386109 Jun 02 '16 at 17:02
  • @s7amuser, thanks for the correction. fixed now. – Zohar81 Jun 02 '16 at 17:03
  • @user3386109 It seems that it is not.. See my answer. – Eugene Sh. Jun 02 '16 at 17:04
  • @EugeneSh. That quote is about functions that take multiple parameters. All of the parameters are evaluated (and side effects completed) before the function is called. But the order of evaluation of the parameters is not specified. And whether the left side or right side of the assignment is evaluated first is unrelated to that quote. – user3386109 Jun 02 '16 at 17:09
  • 1
    `printf(" %d %d " , mm->m , pmm->m);` -->> `printf(" %d %d " , mm.m , pmm->m);` – joop Jun 02 '16 at 17:11
  • @user3386109 Deleted the answer for now, as it seems that you are right.. but there must be something. *Update:* Well, no. It's pretty much the same as `i=i++;`. – Eugene Sh. Jun 02 '16 at 17:12
  • @EugeneSh. I was tempted to hammer this question as a duplicate of one of the questions in my first comment. But decided to leave it open since this is a rather confusing topic, and worth revisiting on occasion. I'll be interested to hear your final conclusions. – user3386109 Jun 02 '16 at 17:17
  • @user3386109 It's up there in "update" :) – Eugene Sh. Jun 02 '16 at 17:18
  • @EugeneSh. Ahh, ok I missed that. Glad we agree :) – user3386109 Jun 02 '16 at 17:19
  • @Zohar81 I suppose I should close this as a duplicate of [why-are-these-constructs-using-undefined-behavior](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-undefined-behavior). How do you feel about that? – user3386109 Jun 02 '16 at 17:21
  • I don't think it's a duplicate.. as you said, it's confusing. – Eugene Sh. Jun 02 '16 at 17:22
  • I think the best answer to the other question is [the third answer](https://stackoverflow.com/a/949508/3386109) which explains this situation pretty well. – user3386109 Jun 02 '16 at 17:26
  • @2501 `*pmm`, the structure that `pmm` points to. The assignment modifies the `m` member of the structure. The function call also modifies the `m` member, after replacing the entire structure with newly allocated memory. So upon returning from the function, the assignment may actually attempt to modify the old structure, rather than the new one. – user3386109 Jun 02 '16 at 17:45
  • @user3386109 There is nothing wrong with that, the old struct is modified. – 2501 Jun 02 '16 at 17:50
  • 1
    @2501 But what if the function were just modifying the `m` field of the same pointer? then you can't tell whether the behavior is undefined or unspecified without looking inside the function? In this case generally it is undefined. – Eugene Sh. Jun 02 '16 at 17:51
  • @EugeneSh, the problem here is not the modifications of `pmm->m`. The problem is the combination of modification of `pmm` by the function and evaluation of `pmm` in `pmm->m` in the left-hand operand. The standard definitely says that the combination has undefined behavior. – John Bollinger Jun 02 '16 at 18:28
  • @JohnBollinger Yes, I see where the problem in this specific example. But I am talking about the general case where the function is a blackbox, which potentially can modify the lvalue. – Eugene Sh. Jun 02 '16 at 18:30
  • 1
    @EugeneSh., I agree that you may not be able to tell from examination of docs and partial source whether there is any undefined behavior. That's not the same thing as there actually *being* undefined behavior, but since you can't tell, I agree that it's wise to assume UB and therefore to rewrite in a way that you can be confident has defined behavior. – John Bollinger Jun 02 '16 at 18:39

2 Answers2

5

The semantics for evaluation of an = expression include that

The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

(C2011, 6.5.16/3; emphasis added)

The emphasized provision explicitly permits your observed difference in the behavior of the program when compiled by different compilers. Moreover, unsequenced means, among other things, that it is permissible for the evaluations to occur in different order even in different runs of the very same build of the program. If the function in which the unsequenced evaluations appear were called more than once, then it would be permissible for the evaluations to occur in different order during different calls within the same execution of the program.

That already answers the question, but it's important to see the bigger picture. Modifying an object or calling a function that does so is a side effect (C2011, 5.1.2.3/2). This key provision therefore comes into play:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

(C2011, 6.5/2)

The called function has the side effect of modifying the value stored in main()'s variable pmm, evaluation of the left-hand operand of the assignment involves a value computation using the value of pmm, and these are unsequenced, therefore the behavior is undefined.

Undefined behavior is to be avoided at all costs. Because your program's behavior is undefined, is not limited to the two alternatives you observed (in case that wasn't bad enough). The C standard places no limitations whatever on what it may do. It might instead crash, zero out your hard drive's partition table, or, if you have suitable hardware, summon nasal demons. Or anything else. Most of these are unlikely, but the best viewpoint is that if your program has undefined behavior then your program is wrong.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • What about the sequence points `;` in the function, after each change to the pointer and the member m? Please see my answer: http://stackoverflow.com/a/37599439/4082723 – 2501 Jun 02 '16 at 18:22
  • @2501, There are sequence points between evaluations within the function, but they do not sequence evaluation of the function call relative to anything outside the function, so they aren't relevant to the question. – John Bollinger Jun 02 '16 at 18:42
  • The function does introduce new sequence points in between the simple assignment. Please see 5.1.2.3 p3 . *The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.* – 2501 Jun 02 '16 at 18:48
  • @2501, the function evaluation contains sequence points, but they are not between evaluation of `pmm` on the LHS of the assignment and modification of `pmm` by the function. In any case, one need not consider the details of the function implementation at all. 5.1.2.3/2 holds that evaluating the function call itself has a side effect on `pmm`, and 6.5.16/3 holds that the evaluations of the two operands of the assignment are unsequenced. – John Bollinger Jun 02 '16 at 18:52
  • There is an additional sequence point right just before the actual function after call,(*Between the evaluations of the function designator and actual arguments in a function call and the actual call.*), separating assignment and modification of `pmm`. – 2501 Jun 02 '16 at 18:55
  • @2501, the sequence point after evaluating the function designator and argument expressions and before evaluating the function body also is not between evaluation of `pmm` on the LHS and modification of `pmm` on the RHS. Again, the evaluations of the two operands of the assignment are unsequenced, per 6.5.16/3. – John Bollinger Jun 02 '16 at 18:59
  • `pmm` is literally only modified inside the function after the sequence call. `(*ppmm) = (struct MM *) malloc ` this happens inside the function, after the left side has been or hasn't been evaluated and after the sequence point of the actual call. – 2501 Jun 02 '16 at 19:01
  • @2501, no sequence point between operations involved in evaluating the RHS of the assignment is between evaluation of the RHS and evaluation of the LHS, nor between evaluation of any pair of subexpressions that are drawn one from each side. Sequence points do not necessarily partition all evaluations performed by a program / thread into "before" and "after". They sequence only those that they occur *between*. – John Bollinger Jun 02 '16 at 19:15
  • By your logic `Memset(&o,1,sizeof(o)) + Memset(&o,0,sizeof(o))` isn't defined because there isn't any sequence point in-between, but this is clearly not the case. (`Memset` is exactly the same as `memset` except it returns an integer.) – 2501 Jun 02 '16 at 19:24
  • @2501, according to 6.5/3, "Except as specified later, side effects and value computations of subexpressions are unsequenced," and modifying the value of `o` in your example expression is a side effect, therefore, yes, the behavior of evaluating the overall expression is undefined. On what basis do you claim otherwise? – John Bollinger Jun 02 '16 at 20:27
  • @2501 I can't tell if your intention is to debate between *undefined behavior*, *unspecified behavior*, and *implementation defined behavior*. I generally lump all three of those into *bad behavior*. But the one thing that's indisputable is that the final contents of `o` cannot be determined by looking at the expression you gave. If your point is that it's not "undefined behavior*, but some other form of "bad behavior*, then you need to be very clear about that. – user3386109 Jun 02 '16 at 20:46
  • @JohnBollinger *Except as specified later...* Subexpressions are unsequenced by default, but doesn't mean that they can't be, if a sequence point is made. The note 86 even specifically mentions unsequenced subexpressions: : *unsequenced and indeterminately sequenced evaluations of its subexpressions.* which indicates that some subexpressions may be sequenced, otherwise there wouldn't be any point in making the distinction. – 2501 Jun 02 '16 at 20:55
  • @user3386109 The above example I have shown has unspecified behavior only. I believe you must agree on that. – 2501 Jun 02 '16 at 21:00
  • @2501 I do agree that your example has *"unspecified behavior"* and not *"undefined behavior"*. But in my opinion, that's a distinction without a difference. Either way, it's *"bad behavior"*. – user3386109 Jun 02 '16 at 21:08
  • @user3386109 Op is asking about the behavior and it is important to make a distinction between defined, undefined, unspecified and the output if an answer is to be given. A lot of things are the subjective "bad behavior", but we debate them nevertheless. – 2501 Jun 02 '16 at 21:11
2

When using the simple assignment operator: =, the order of evaluation of operands is unspecified. There is also no sequence point in between the evaluations.

For example if you have two functions:

*Get() = logf(2.0f);

It is not specified in which order they are called at any time, and yet this behavior is completely defined.

A function call will introduce a sequence point. It will happen after the evaluation of the arguments and before the actual call. The operator ; will also introduce a sequence point. This is important because an object must not be modified twice without an intervening sequence point, otherwise the behavior is undefined.

Your example is particularly complicated due to unspecified behavior, and may have different results, depending the left or right operand is evaluated first.

  1. The left operand is evaluated first.

The left operand is evaluated and the pointer pmm will point to the struct mm. Then the function is called, and a sequence point occurs. it modifies the pointer pmm by pointing it to allocated memory, followed by a sequence point because of the operator ;. Then it stores the value 1000 to the member m, followed by another sequence point because of ;. The function returns 100 and assigns it to the left operand, but since the left operand was evaluated first, the value 100, it is assigned to the object mm, more specifically its member m.

mm->m has the value 100 and ppm->m has the value 1000. This is defined behavior, no object is modified twice in-between sequence points.

  1. The right operand is evaluated first.

The function is called first, the sequence point occurs, it modifies the pointer ppm by pointing it to new allocated struct, followed by a sequence point. Then it stores the value 1000 to the member m, followed by a sequence point. Then the function returns. Then the left operand is evaluated, ppm->m will point to the new allocated struct, and its member m, is modified by assigning it the value 100.

mm->m will have the value 500 since it was never modified, and pmm->m will have the value 100. No object was modified twice in-between sequence points. The behavior is defined.

2501
  • 25,460
  • 4
  • 47
  • 87