1

Problem

  • This example is a minimal reproduction of this issue encountered a code golf attempt. I am not advocating anyone use this style.

  • I am aware I may have hit undefined behaviour, my question is: why specifically, and does this go for C as well as C++?

I have this file:

#include <stdio.h>
int main() {
    int a[]={1, 0}, i=0;
    i += a[i]++;
    printf("i=%d a=[%d, %d]\n", i, a[0], a[1]);
    return 0;
}

With GCC on Ubuntu on WSL I get the expected i=1 a=[2, 0] but with Visual C++ it is i=1 a=[1, 1]

The problem is fixed if I do this:

delta = a[i]++;
i += delta;

Whilst order of evaluation of assignment is unspecified, I am surprised that it appears that a[i] is evaluated again for the increment, after the assignment of i. The disassembly below seems to confirm this.

Do either the C or C++ standards allow this?


Background

Build command line and output on Windows:

>cl test.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25830.2 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
Microsoft (R) Incremental Linker Version 14.12.25830.2
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
test.obj

Dissasembly of the statement (built with /Zi):

     4:     i += a[i]++;
00007FF6CBBA626C  movsxd      rax,dword ptr [i]  
00007FF6CBBA6271  mov         eax,dword ptr a[rax*4]  
00007FF6CBBA6275  mov         ecx,dword ptr [i]  
00007FF6CBBA6279  add         ecx,eax  
00007FF6CBBA627B  mov         eax,ecx  
00007FF6CBBA627D  mov         dword ptr [i],eax  
00007FF6CBBA6281  movsxd      rax,dword ptr [i]  
00007FF6CBBA6286  mov         eax,dword ptr a[rax*4]  
00007FF6CBBA628A  inc         eax  
00007FF6CBBA628C  movsxd      rcx,dword ptr [i]  
00007FF6CBBA6291  mov         dword ptr a[rcx*4],eax  
Sijmen Mulder
  • 5,767
  • 3
  • 22
  • 33
  • It's not that the order of evaluation is unspecified, it's that the behavior is completely undefined. The compiler can emit any code it wants when it encounters this statement. So yes, the standards permit this. – cdhowie Dec 07 '17 at 09:08
  • @SijmenMulder: Undefined behaviour. Rule of thumb: don't read and write from and to a variable without a sequencing point. MSVC is famous in not doing the increment until the end of the statement. gcc does it differently; from my understanding, exactly how Java does it. – Bathsheba Dec 07 '17 at 09:09
  • The linked "exact duplicate" is specifically about C++. I'm also asking in the context of C. – Sijmen Mulder Dec 07 '17 at 09:12
  • @SijmenMulder You can't reason about undefined behavior, that's the whole point. We can't tell you what's happening because the standard does not define how this code behaves. – cdhowie Dec 07 '17 at 09:12
  • I've made a simple edit to keep the `void` police away. Also, this particular question is, to me, well-written, and have upvoted it. – Bathsheba Dec 07 '17 at 09:13
  • @cdhowie thanks. The answer then, is that in this statement the evaluation order of not only assignment but also the LHS of the increment is undefined, in both C and C++? That would satisfy my question completely – Sijmen Mulder Dec 07 '17 at 09:16
  • @SijmenMulder: The answer is in the linked duplicate. I hope my "rule of thumb" comment addresses your last comment. – Bathsheba Dec 07 '17 at 09:17
  • @SijmenMulder No. The answer is that the assignment and/or increment may not even happen. The increment may happen twice. Some other random things might happen instead. Undefined behavior does *not* mean "things may happen in any order." It means *"anything at all might happen."* – cdhowie Dec 07 '17 at 09:17
  • I'm aware of this aspect of UB; should've phrased the question better. What I wonder is *why* this is UB in C. The linked dupe is very informative but exclusively about various versions of C++. – Sijmen Mulder Dec 07 '17 at 09:22
  • @SijmenMulder: I see your point - but if I reopen it in its current form then I imagine it will get closed again pretty quickly. Could you make some pre-emptive edits then I'll reopen it once I'm back from the coffee shop? – Bathsheba Dec 07 '17 at 09:24
  • @SijmenMulder [This question](https://stackoverflow.com/q/949433/501250) covers C. My previous comment specifically addresses your comment saying *"the evaluation order of not only assignment but also the LHS of the increment is undefined."* Evaluation order is a red herring. – cdhowie Dec 07 '17 at 09:26
  • Ok, I've reopened, controversially. Do expect it to be closed again though; I'm not convinced your edit cuts it. – Bathsheba Dec 07 '17 at 09:28
  • 1
    The answer to the question posed in the title is still "because this statement does not have defined behavior." I *really* don't understand what you're trying to get out of this question. This statement does not have defined behavior in C or C++. Full stop. There is no use trying to reason about what it does, just like we don't try to make sense out of the words "this statement is false." – cdhowie Dec 07 '17 at 09:30
  • I wanted to know *what* rule I'm breaking.`a = b` is fine, `a = b++` is, `a = a + b`, mine isn't. I'll dive into the Wikipedia standard and article to found out. – Sijmen Mulder Dec 07 '17 at 09:32
  • 1
    @SijmenMulder: The rule you're breaking is "reading and writing to and from the same variable in an unsequenced step". What more can we say?! – Bathsheba Dec 07 '17 at 09:33
  • Exactly that. If that's a rule in C too, that answer the question. Does it mean `a = a + 2` would be UB, too? – Sijmen Mulder Dec 07 '17 at 09:39
  • C and C++ are completely different languages with different rules. Please ask about one language in one question unless the question is specifically about interaction of the two languages. Otherwise it can and should be closed based on a dupe about either language. – n. m. could be an AI Dec 07 '17 at 09:46
  • 1
    @SijmenMulder: Both C and C++ have this down as UB, although the standards cite *slightly* different reasons, mainly centred around the redefinitions of sequencing points in C++ that haven't (yet) made it into the C standard. – Bathsheba Dec 07 '17 at 09:49
  • `a = a + 2` is well-defined since the right hand side is fully evaluable without a write to `a`. I concur that my brief pseudo-comment-answers are not watertight enough to not emit that as being a possibility. Shall we call it 15-15? – Bathsheba Dec 07 '17 at 09:50
  • @M.M. I specifically asked why I was hitting UB. The answer has been provided now. – Sijmen Mulder Dec 07 '17 at 09:50
  • Ha. Well I'll see if I can write up a well cited answer with the C standard over lunch, now that I know about sequence points and such. – Sijmen Mulder Dec 07 '17 at 09:54
  • @Bathsheba C also had sequencing redefined for C11, it's similar to C++ but not identical – M.M Dec 07 '17 at 09:55
  • @M.M Technically, C11 is the same as before. 6.5.16/3 (assignment operator) contains the text "The evaluations of the operands are unsequenced.", which makes C11 different than C++11 here. In C++11, I believe the order is no longer unsequenced. – Lundin Dec 07 '17 at 10:01
  • 5
    Actually I don't see any basis for saying the code is undefined. It's the same as `i = i + a[i]` which is just fine (with the unsequenced side-effect of incrementing `a[0]` being immaterial because that does not affect `i`). The evaluation (but not side-effect) of all the operands on the right hand side are sequenced before the write to the left-hand side, in all versions of C and C++ – M.M Dec 07 '17 at 10:28
  • @M.M - But isn't the issue that there are *two* writes in the OP's expression, with no intermediate sequence point? I don't see anything in the standard that implies `i += a[i]++;` *must* be evaluated as `int *tmp = &a[i]; i += *tmp; (*tmp)++;` (as opposed to `i += a[i]; a[i]++;`). – Oliver Charlesworth Dec 07 '17 at 11:15
  • @OliverCharlesworth There is one write to `i`, and one write to `a[i]`, nothing wrong with that. – n. m. could be an AI Dec 07 '17 at 11:23
  • @n.m. But the write to `a[i]` depends on the value of `i` (so IMO, falls foul of the "*the prior value shall be accessed only to determine the value to be stored*" restriction). But maybe to frame this a different way, if this is indeed defined behaviour, which of the two behaviours observed by the OP is defined by the standard? – Oliver Charlesworth Dec 07 '17 at 11:30
  • @OliverCharlesworth The assignment happens after its subexpressions are evaluated. Or so I think. – Passer By Dec 07 '17 at 11:31
  • @PasserBy - That is undeniably true (how could it be any other way? ;) But the `+=` isn't a sequence point. – Oliver Charlesworth Dec 07 '17 at 11:32
  • I don't know if it makes much of a difference, but writing `i = i + …` instead of `+=` didn't make a difference (not that I expected it would), nor did wrapping `a[i]++` in parens. – Sijmen Mulder Dec 07 '17 at 12:25
  • For what its worth, I don't think this is UB in both C and C++ past C++11 and C11. – Passer By Dec 07 '17 at 18:03
  • The proof being: `i` and `a[i]++` are both sequenced before `+=`, hence the effect is equivalent to `int* p = a + i; int j = *p; (*p)++; i += j;` – Passer By Dec 07 '17 at 18:08
  • 2
    I'm voting to reopen because in all my efforts, I can only deduce it is a bug in MSVC (which supposedly supports C++11) and thus not a dupe. @OliverCharlesworth – Passer By Dec 07 '17 at 18:12
  • @PasserBy - "i and a[i]++ are both sequenced before +=" - how did you conclude that? There are no sequence points at all here. – Oliver Charlesworth Dec 07 '17 at 18:17
  • @OliverCharlesworth I just looked up the C11 standard, and apparently it also went ahead for the notion of sequenced before instead of sequence points. From 6.5.1 _"The value computations of the operands of an operator are sequenced before the value computation of the result of the operator."_ – Passer By Dec 07 '17 at 18:20
  • The almost exact same passage exists in the C++ standard, with a following example of `i = i++ + 1;` being well-defined and means to increment by `1`. – Passer By Dec 07 '17 at 18:28
  • @PasserBy - That's the **value** - what about the **side effects**? – Oliver Charlesworth Dec 07 '17 at 18:29
  • @OliverCharlesworth I am confused about that too. Both standards use the same wording, and it _seems_ like it should mean both the value computation and the initiation of side effects. – Passer By Dec 07 '17 at 18:30

1 Answers1

4

The code is well-defined in all versions of C and C++. You are seeing a bug in MSVC.

The definition of the += operator is that x += y means x = x + y, except that x is only evaluated once. (Note that "evaluate" is different to "lvalue conversion").

So we are looking at i = i + a[i]++.

Informally: the new value to store in i cannot be computed until both operands of + have been evaluated, and had lvalue conversion performed. Because we need the result of that lvalue conversion in order to know the value to store.

The term "evaluate" for an lvalue means to determine the memory location which the lvalue refers to. Evaluating a[i]++ when i is 0 beforehand means to determine that we are referring to a[0]. Then lvalue conversion retrieves the stored value of a[0], and the unsequenced side-effect is that the stored value of a[0] will be updated sooner or later.

Writing to a[0] has nothing to do with i, so there is no reason to believe there might be undefined behaviour.


More formally: prior to 2011, the C and C++ standards used a somewhat confusing sentence that expressed the logic in my "Informally:" paragraph. If the expression reads and writes i then it is well-defined only if the reads of i are all necessary to determine the new value to store in i, because that will guarantee that the reads have all happened before it is possible to store the new value.

In C11 and C++11 the language changed but not the implication. C11 6.5.16/3 (Assignment operators) says:

The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands.

which leaves no doubt that the write to i is sequenced after all reads of i on the right-hand side.

C++11 has similar language:

In all cases, the assignment is sequenced after the value computation of the right and left operands,

M.M
  • 138,810
  • 21
  • 208
  • 365