162

Consider this code:

int i = 1;
int x = ++i + ++i;

We have some guesses for what a compiler might do for this code, assuming it compiles.

  1. both ++i return 2, resulting in x=4.
  2. one ++i returns 2 and the other returns 3, resulting in x=5.
  3. both ++i return 3, resulting in x=6.

To me, the second seems most likely. One of the two ++ operators is executed with i = 1, the i is incremented, and the result 2 is returned. Then the second ++ operator is executed with i = 2, the i is incremented, and the result 3 is returned. Then 2 and 3 are added together to give 5.

However, I ran this code in Visual Studio, and the result was 6. I'm trying to understand compilers better, and I'm wondering what could possibly lead to a result of 6. My only guess is that the code could be executed with some "built-in" concurrency. The two ++ operators were called, each incremented i before the other returned, and then they both returned 3. This would contradict my understanding of the call stack, and would need to be explained away.

What (reasonable) things could a C++ compiler do that would lead to a result of 4 or a result or 6?

Note

This example appeared as an example of undefined behavior in Bjarne Stroustrup's Programming: Principles and Practice using C++ (C++ 14).

See cinnamon's comment.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cinnamon
  • 1,750
  • 2
  • 6
  • 11
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/215309/discussion-on-question-by-cinnamon-in-practice-why-would-different-compilers-co). – Samuel Liew Jun 04 '20 at 12:33
  • I dont' think the problem is actually the compilers, the problem is that for the same code, different CPUs might result in different values, and even on one CPU, different executions might result in different values. – Mooing Duck Jun 05 '20 at 17:11
  • 5
    The C spec doesn't actually cover the ordering of operations or evaluations on the right side of the = compared to the pre/postincrement operations, only on the left. – Cristobol Polychronopolis Jun 05 '20 at 17:51
  • Though this [answer](https://stackoverflow.com/a/31083924/2455888) is for C, it addresses the same question. – haccks Jun 05 '20 at 22:38
  • 2
    Recommend that you give citation in the question if you got this example from a book by Stroustrup (as mentioned in comment to one of the answers). – Daniel R. Collins Jun 06 '20 at 15:31
  • What version of C++? PS You have no reason to think something "seems most likely" & that expression does not reflect how the language is defined, and you seem to have fundamental misconceptions about that. "understand compilers better" OK but fundamental to that is what a language is defined to be expressing. Also this shows no research about that. "what (reasonable) things could a C++ compiler do" that isn't specified by the language requires you to define what "do" means outside the language's "abstract machine". – philipxy Jun 07 '20 at 08:37
  • The standard has nothing to say about object code. Unfortunately, so far the answers here are fragmented & unclear about how example implementations are justified by or related to the language definition. PS You don't define "reasonable". But--When a program contains text with undefined behaviour, that program can do anything as far as the standard is concerned. – philipxy Jun 07 '20 at 08:57
  • Does this answer your question? [i = ++i + ++i; in C++](https://stackoverflow.com/questions/340282/i-i-i-in-c) – philipxy Jun 07 '20 at 09:03
  • This is a faq. Before considering posting please read your textbook and/or manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. Reflect your research. See [ask] & the voting arrow mouseover texts. – philipxy Jun 07 '20 at 09:03
  • 5
    @philipxy Your suggested duplicate is not a duplicate of this question. The questions are different. The answers in your suggested duplicate do not answer this question. The answers in your suggested duplicate are not duplicates of the accepted (or high vote) answer(s) to this question. I believe you have misread my question. I suggest you reread it and reconsider the vote to close. – cinnamon Jun 07 '20 at 18:33
  • The answers say that a compiler can do anything that returns the specifed value, so they answer the question, and they show that even if you think your question is different it is just a variation on that one with the same answer, and they say that the statement behaviour happens to be undiefined (although you don't give your version of C++), and that therefore the entire program the statement is in can do anything at all. They even address determing what the meaning of the statement is. Your comment doesn't reflect the content of the answers there. – philipxy Jun 07 '20 at 20:00
  • 3
    @philipxy "The answers say that a compiler can do anything ..." That doesn't answer my question. "they show that even if you think your question is different it is just a variation on that one" What? "although you don't give your version of C++" My version of C++ is irrelevant to my question. "therefore the entire program the statement is in can do anything at all" I know, but my question was about specific behavior. "Your comment doesn't reflect the content of the answers there." My comment reflects the content of my question, which you should reread. – cinnamon Jun 07 '20 at 20:06
  • 1
    @philipxy: In your linked question, the highest-voted and selected answer says: "Obviously, the code compiles to: `i = i + 1; i = i + 1; i = i + i;`" Do you agree with that answer? – Daniel R. Collins Jun 08 '20 at 02:29
  • 2
    To answer the title; because UB means the bahavioir is undefined. Multiple compilers made at different times throughout history, by different people for different architectures, when asked to colour outside of the lines and make a real world implementation they had to put _something_ in this part outside of the specification, so people did exactly that and each of them used different crayons. Hence the hoary maxim, do not rely on UB – Toby Jun 10 '20 at 20:06
  • As it is UB, a good compiler should flag it at least as a warning. In any case, static analysis tools should flag it as an error. This applies to any UB. – Jonathan Rosenne Jun 11 '20 at 07:29
  • 1
    Why in the name of god (the old one and the new one) would C/++ choose to be so complicated that you have to read the manual in order to figure out a simple math operation... Wait... actually, not even after reading the manual you can't be sure! I miss the old Delphi times with its "the compiler won't let you shoot yourself in the foot". – Gabriel Jun 12 '20 at 22:18
  • http://www.netzmafia.de/service/sprachen.html – Gabriel Jun 12 '20 at 22:23
  • 1
    Does this answer your question? [Why are these constructs using pre and post-increment undefined behavior?](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-pre-and-post-increment-undefined-behavior) – Martin James Jun 25 '20 at 11:57
  • 1
    This entire question is a waste of effort and all the answers speculation/guesswork. The code itself is just rubbish, and everything after is fruit of a poisoned tree:( The only time I would interact with such code is to fire the author, (if tbey worked for me:) – Martin James Jun 25 '20 at 12:03
  • @philipxy "The asker thinks that by using the undefined term 'reasonable' they are not asking the duplicate question;" I never said anything like that. Without 'reasonable', this question is still not a duplicate of the question you linked. Please don't put words in my mouth. – cinnamon Jun 25 '20 at 12:43
  • @Toby: Note that when the Standard classifies an action as UB, that doesn't mean that the authors of the Standard were intending to *invite* compilers to behave in completely arbitrary fashion, but merely that they *waived jurisdiction*. Code which would rely upon an expression like `(*p)++ + (*q)++` yielding a particular result when `p` and `q` are equal would be extremely non-portable, but that doesn't mean that code which relies upon such computation to be a no-op in a cases where the result of the expression and value held in `*p` and `*q` end up being ignored shouldn't be expected... – supercat Sep 08 '22 at 20:29
  • ...to work correctly on most implementations. – supercat Sep 08 '22 at 20:30

16 Answers16

200

The compiler takes your code, splits it into very simple instructions, and then recombines and arranges them in a way that it thinks optimal.

The code

int i = 1;
int x = ++i + ++i;

consists of the following instructions:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

But despite this being a numbered list the way I wrote it, there are only a few ordering dependencies here: 1->2->3->4->5->10->11 and 1->6->7->8->9->10->11 must stay in their relative order. Other than that the compiler can freely reorder, and perhaps eliminate redundancy.

For example, you could order the list like this:

1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
4. store tmp1 in i
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

Why can the compiler do this? Because there's no sequencing to the side effects of the increment. But now the compiler can simplify: for example, there's a dead store in 4: the value is immediately overwritten. Also, tmp2 and tmp4 are really the same thing.

1. store 1 in i
2. read i as tmp1
6. read i as tmp3
3. add 1 to tmp1
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

And now everything to do with tmp1 is dead code: it's never used. And the re-read of i can be eliminated too:

1. store 1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
10. add tmp3 and tmp3, as tmp5
11. store tmp5 in x

Look, this code is much shorter. The optimizer is happy. The programmer is not, because i was only incremented once. Oops.

Let's look at something else the compiler can do instead: let's go back to the original version.

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
5. read i as tmp2
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

The compiler could reorder it like this:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
9. read i as tmp4
10. add tmp2 and tmp4, as tmp5
11. store tmp5 in x

and then notice again that i is read twice, so eliminate one of them:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp3
7. add 1 to tmp3
8. store tmp3 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

That's nice, but it can go further: it can reuse tmp1:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
6. read i as tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Then it can eliminate the re-read of i in 6:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
4. store tmp1 in i
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Now 4 is a dead store:

1. store 1 in i
2. read i as tmp1
3. add 1 to tmp1
7. add 1 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

and now 3 and 7 can be merged into one instruction:

1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
5. read i as tmp2
10. add tmp2 and tmp2, as tmp5
11. store tmp5 in x

Eliminate the last temporary:

1. store 1 in i
2. read i as tmp1
3+7. add 2 to tmp1
8. store tmp1 in i
10. add tmp1 and tmp1, as tmp5
11. store tmp5 in x

And now you get the result that Visual C++ is giving you.

Note that in both optimization paths, the important order dependencies were preserved, insofar as the instructions weren't removed for doing nothing.

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • 36
    Currently, this is the only answer which mentions [sequencing](https://en.wikipedia.org/wiki/Sequence_point#Sequence_points_in_C_and_C++). – PM 2Ring Jun 04 '20 at 11:32
  • 1
    Good visualization yet 2,3,4 can happen at the same time as 6,7,8 and conflict of 4,8 can lead to non-deterministic behavior. – chux - Reinstate Monica Jun 04 '20 at 14:16
  • 3
    -1 I don't think that this answer is clarifying. The observed results aren't dependent on any compiler optimizations at all (see my answer). – Daniel R. Collins Jun 04 '20 at 15:42
  • 1
    Couldn't a compiler theoretically even split the work into threads in a way that may result in different behaviour for different runs? – Hagen von Eitzen Jun 04 '20 at 16:12
  • 3
    This assumes read-modify-write operations. Some CPUs, such as the ubiquitous x86, have an atomic increment operation, which adds even more complexity to the situation. – Mark Jun 04 '20 at 20:02
  • 1
    @Mark True, but modern compilers typically go through a CPU-agnostic intermediate representation that has simpler instructions, and only combine them together to use special instructions after that. – Sebastian Redl Jun 04 '20 at 21:06
  • 2
    @HagenvonEitzen In theory, yes. In practice, I know of no compiler that does automatic multithreading, because the complexity is great, and the overhead of distributing work is such that the small blocks of code the optimizer usually deals with are not worth it. – Sebastian Redl Jun 04 '20 at 21:08
  • The standard has nothing to say about object code. This is fragmented & unclear about how your suggested implementations are justified by or related to the language definition. – philipxy Jun 07 '20 at 08:51
  • 6
    @philipxy "The standard has nothing to say about object code." The standard has nothing to say about the behavior of this snippet either - it's UB. That's a premise of the question. The OP wanted to know why, in practice, compilers could arrive at different and weird results. Also, my answer doesn't even say anything about object code. – Sebastian Redl Jun 08 '20 at 15:43
  • @philipxy No, it's hypothetical compiler-internal simple instructions. But maybe you have a different definition of "object code" than I. – Sebastian Redl Jun 09 '20 at 11:39
  • 1
    Just because the object machine is hypothetical doesn't mean its object code isn't object code. It means its object code is hypothetical object code. I'm done. – philipxy Jun 09 '20 at 17:33
  • 5
    @philipxy I don’t understand your objection. As noted, the question is about what a compiler might do in the presence of UB, not about the C++ standard. Why would using object code be inappropriate when exploring how a hypothetical compiler is transforming the code? In fact, how would *anything but* object code be relevant? – Konrad Rudolph Jun 10 '20 at 09:29
  • @SebastianRedl shouldn't all compilers supposed to make sure to give same result. otherwise there doesn't exist a standard cpp – Sudip Ghimire Jun 22 '20 at 21:39
  • 1
    @SudipGhimire The code in question is invalid according to standard C++. It has undefined behavior. The result can be *anything*. All I did was describe possible ways compilers handle it in practice. – Sebastian Redl Jun 24 '20 at 08:10
  • 1
    @SudipGhimire And just to be clear: this handling is nothing intentional. No compiler writer sits down and says, "This is how I'll deal with this UB." (Well, there have been a few cases where specific UB was given documented, explicit results by compiler writers because their customers really wanted it.) It's just a side effect of how compilers treat *conforming* code. – Sebastian Redl Jun 24 '20 at 08:21
  • 1
    @SebastianRedl: There are a lot more than "a few" cases where situations where the Standard imposes no requirements, but implementations usefully behave "in a documented fashion characteristic of the environment". Indeed, one of the reasons so many things were left undefined was to identify areas of "conforming language extension", and behaving in a documented fashion characteristic of the environment makes it possible for non-portable programs to do things that weren't contemplated by the Standard, and thus would not be possible in portable programs. – supercat Dec 31 '20 at 21:36
57

While this is UB (as the OP implied), following are hypothetical ways a compiler could get the 3 results. All three would give the same correct x result if used with different int i = 1, j = 1; variables instead of one and the same i.

  1. both ++i return 2, resulting in x=4.
int i = 1;
int i1 = i, i2 = i;   // i1 = i2 = 1
++i1;                 // i1 = 2
++i2;                 // i2 = 2
int x = i1 + i2;      // x = 4
  1. one ++i returns 2 and the other returns 3, resulting in x=5.
int i = 1;
int i1 = ++i;           // i1 = 2
int i2 = ++i;           // i2 = 3
int x = i1 + i2;        // x = 5
  1. both ++i return 3, resulting in x=6.
int i = 1;
int &i1 = i, &i2 = i;
++i1;                   // i = 2
++i2;                   // i = 3
int x = i1 + i2;        // x = 6
dxiv
  • 16,984
  • 2
  • 27
  • 49
  • 2
    this is a better response than what I hoped for, thank you. – cinnamon Jun 04 '20 at 02:07
  • 1
    For option 1, the compiler might have made a note to preincrement `i`. Knowing it can only happen once, it only emits it once. For option 2 the code is translated to machine code literally, like a college compiler class project might do. For option 3, it's like option 1, but it made two copies of the preincrement. Must have used a vector, not a set. :-) – Zan Lynx Jun 04 '20 at 02:12
  • @dxiv sorry, my bad, I mixed up posts – muru Jun 05 '20 at 17:25
19

To me, the second seems most likely.

I am going for option #4: Both ++i happen concurrently.

Newer processors move toward some interesting optimizations and parallel code evaluation, where allowed as here, is another way compilers keep making faster code. I see as a practical implementation, compilers moving toward parallelism.

I could readily see a race condition causing non-deterministic behavior or a bus fault due to same memory contention - all allowed as the coder violated the C++ contract - hence UB.

My question is: what (reasonable) things could a C++ compiler do that would lead to a result of 4 or a result or 6?

It could, but do not count in it.

Don't use ++i + ++i nor expect sensible results.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • If I could accept both this answer and @dxiv's I would. Thank you for the response. – cinnamon Jun 04 '20 at 02:16
  • Re option #4: shouldn't the processor realize the two increments create a data hazard (read after write) and resolve it? – Uri Raz Jun 04 '20 at 08:09
  • 4
    @UriRaz: The processor might not even notice there's a data hazard depending on the compiler's choice. E.g. the compiler might assign `i` to two registers, increment both registers, and write them both back. The processor has no way to resolve that. The fundamental problem is that neither C++ nor modern CPU's are strictly sequential. C++ explicitly has the happens-before and happens-after sequencing, to allow a happens-at-the-same-time by default. – MSalters Jun 04 '20 at 11:07
  • @UriRaz: There *are* CPUs where explicit parallelism is possible and avoiding hazards are the responsibility of software (the compiler, or programmer of hand-written asm). e.g. VLIW ISAs can be like that, where a packet of multiple instructions happens at once. Or first-gen MIPS had load delay slots: reading a load result in the instruction after the load would see the old value (unless it missed in cache, then it would stall until data arrived and you'd see the new value, so you couldn't count on reading the old value.) – Peter Cordes Jun 04 '20 at 13:12
  • 1
    But we know that's not the case for the OP using Visual Studio; most mainstream ISAs including x86 and ARM are defined in terms of a fully sequential model of execution where one machine instruction fully finishes before the next one starts. Superscalar out-of-order must maintain that illusion for a single thread. (Other threads reading shared memory aren't guaranteed to see things in program order, but the cardinal rule of OoO exec is not to break single-threaded execution.) – Peter Cordes Jun 04 '20 at 13:14
  • @Uri Raz "shouldn't the processor realize the two increments create a data hazard" --> No. It could, yet the language imposes no requires for such a check. If you are looking for a language with many safeguards against coder errors, C/C++ is not the right choice. Compiler trust that the programmer knowingly violated "two increments create a data hazard" for whatever reason and does the best it can. – chux - Reinstate Monica Jun 04 '20 at 14:09
  • 1
    This is my favorite answer, cause it is the only one which mentions parallel instruction execution at CPU level. Btw, would be nice to mention in the answer that either due to race conditions cpu thread will be stalled waiting for a mutex unlock on same memory location, so this is very unoptimal in concurrency model. Second - due to same race condition real answer can be `4` or `5`,- depending on the cpu thread execution model/speed, so this is UB at heart. – Agnius Vasiliauskas Jun 05 '20 at 08:50
  • 1
    @AgniusVasiliauskas Perhaps, yet "In practice, why would different compilers compute different values" is looking more for an easy to understand given a simplistic view of processors of today. Yet the range of compilers/processors scenarios if far greater than the few various answers mentioned. Your useful insight is yet another. IMO, parallelism is the future and so this answer focused on those, albeit in an abstract way - as the future is still unfolding. IAC, the post has become popular and easy to grasp answers are best rewarded. – chux - Reinstate Monica Jun 05 '20 at 11:31
  • Agree, I also noticed that couple of analogies and several words broadening "IT intuition" is far more better accepted than full-scale expertise knowledge, which is probably hard to read and is boring for programming new-comers. – Agnius Vasiliauskas Jun 05 '20 at 14:26
16

I think that a simple and straightforward interpretation (without any bid to compiler optimizations or multithreading) would be just:

  1. Increment i
  2. Increment i
  3. Add i + i

With i incremented twice, its value is 3, and when added together, the sum is 6.

For inspection, consider this as a C++ function:

int dblInc ()
{
    int i = 1;
    int x = ++i + ++i;
    return x;   
}

Now, here's the assembly code I get from compiling that function, using an old version of the GNU C++ compiler (win32, gcc version 3.4.2 (mingw-special)). There's no fancy optimizations or multithreading happening here:

__Z6dblIncv:
    push    ebp
    mov ebp, esp
    sub esp, 8
    mov DWORD PTR [ebp-4], 1
    lea eax, [ebp-4]
    inc DWORD PTR [eax]
    lea eax, [ebp-4]
    inc DWORD PTR [eax]
    mov eax, DWORD PTR [ebp-4]
    add eax, DWORD PTR [ebp-4]
    mov DWORD PTR [ebp-8], eax
    mov eax, DWORD PTR [ebp-8]
    leave
    ret

Note that local variable i is sitting on the stack in just one single place: address [ebp-4]. That location is incremented twice (in the 5th-8th lines of the assembly function; including apparently redundant loads of that address into eax). Then on the 9th-10th lines, that value is loaded into eax, and then added into eax (that is, computes the current i + i). Then it's redundantly copied to the stack and back to eax as the return value (which will obviously be 6).

It may be of interest to look at the C++ standard (here, an old one: ISO/IEC 14882:1998(E)) which says for Expressions, section 5.4:

Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.

With the footnote:

The precedence of operators is not directly specified, but it can be derived from the syntax.

Two examples of unspecified behavior are given at that point, both involving the increment operator (one of which is: i = ++i + 1).

Now, if one wished, one could: Make an integer wrapper class (like a Java Integer); overload functions operator+ and operator++ such that they return the intermediate value objects; and thereby write ++iObj + ++iObj and get it to return an object holding 5. (I haven't included full code here for the sake of brevity.)

Personally, I'd by intrigued if there was an example of a well-known compiler that did the job any other way than the sequence seen above. It seems to me like the most straightforward implementation would be to just do two assembly-code incs on the primitive type before the addition operation is performed.

Daniel R. Collins
  • 893
  • 1
  • 8
  • 22
  • 2
    Increment operator does really have a very well defined "return" value – edc65 Jun 05 '20 at 13:46
  • @philipxy: I've edited the answer to take out the passages that you objected to. You may or may not agree with the answer more at this point. – Daniel R. Collins Jun 05 '20 at 18:49
  • 2
    Those are not "Two examples of unspecified behavior", those are two examples of *undefined behavior*, a very different beast, arising from a different passage in the standard. I see C++98 used to say "unspecified" in the text of the footnote's example, contradicting the normative text, but that was fixed later on. – Cubbi Jun 05 '20 at 21:21
  • @Cubbi: Both the text and the footnote in the standard quoted here use the phrase "unspecified", "not directly specified", and it seems to match the term from definitions section 1.3.13. – Daniel R. Collins Jun 06 '20 at 22:58
  • This is improved. Unfortunately, this doesn't reflect what the standard says the given expression means, even though you quote it. Your "simple and straightforward interpretation" is unrelated to it. The standard has nothing to say about object code let alone an object code "stack". – philipxy Jun 07 '20 at 08:47
  • 1
    @philipxy: I see that you've repeated the same comment to many of the answers here. It seems like your main critique is really more about the OP's question itself, whose scope is not just about the abstract standard. – Daniel R. Collins Jun 07 '20 at 14:22
  • My comment says that the answers are poor & why. It has nothing to do with the question except that part of what answers should say is that anything is reasonable including not ever evaluating the expression & the asker should be disabused of misconceptions & moreover the question is a duplicate so shouldn't be answered. (And the question is unresearched so should be downvoted.) If the question were instead about basic principles of compiling (which is maybe what the asker is trying to express via "reasonable") it would be too vague & broad & still unresearched. – philipxy Jun 07 '20 at 21:35
6

The reasonable thing that a compiler can do is Common Subexpression Elimination. This is already a common optimisation in compilers: if a subexpression like (x+1) occurs more than once in a larger expression, it only needs to be calculated once. E.g. in a/(x+1) + b*(x+1) the x+1 sub-expression can be calculated once.

Of course, the compiler has to know which sub-expressions can be optimised that way. Calling rand() twice should give two random numbers. Non-inlined function calls must therefore be exempt from CSE. As you note, there is no rule that says how two occurrences of i++ should be handled, so there's no reason to exempt them from CSE.

The result may indeed be that int x = ++i + ++i; is optimised to int __cse = i++; int x = __cse << 1. (CSE, followed by repeated strength reduction)

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • The standard has nothing to say about object code. This isn't justified by or related to the language definition. – philipxy Jun 07 '20 at 08:49
  • 1
    @philipxy: The standard has nothing to say about any form of Undefined Behavior. That's the premise of the question. – MSalters Jun 08 '20 at 08:46
5

There is no reasonable thing a compiler could do to get a result of 6, but it's possible and legitimate. A result of 4 is entirely reasonable, and I'd consider a result of 5 borderline reasonable. All of them are perfectly legal.

Hey, wait! Isn't it clear what must happen? The addition needs the results of the two increments, so obviously these must happen first. And we go left to right, so... argh! If only it was so simple. Unluckily, that's not the case. We do not go left to right, and that's the problem.

Reading the memory location into two registers (or initializing them both from the same literal, optimizing out the round trip to memory) is a very reasonable thing for the compiler to do. This will effectively have the effect of there covertly being two different variables, each with a value of 2, which will finally be added to a result of 4. This is "reasonable" because it's fast and efficient, and it is in accordance with both the standard and with the code.

Similarly, the memory location could be read once (or the variable initialized from the literal) and incremented once, and a shadow copy in another register could be incremented after that, which would result in 2 and 3 being added together. This is, I would say, borderline reasonable, although perfectly legal. I deem it borderline reasonable because it isn't one or the other. It's neither the "reasonable" optimized way, nor is it the "reasonable" exactly-pedantic way. It's somewhat in the middle.

Incrementing the memory location twice (resulting in a value of 3) and then adding that value to itself for a final result of 6 is legitimate, but not quite reasonable as doing memory round trips isn't precisely efficient. Although on a processor with good store forwarding, it might as well be "reasonable" to do it, since the store should be mostly invisible...
As the compiler "knows" that it's the same location, it might as well choose to increment the value twice within a register, and then add it to itself, too. Either approach would give you the result of 6.

The compiler is, by the wording of the standard, allowed to give you any such result, although I would personally consider 6 pretty much a "fuck you" memo from the Obnoxious Department, as it is a rather unexpected thing (legal or not, trying to always give the least amount of surprises is a good thing to do!). Though, seeing how Undefined Behavior is involved, sadly one cannot really argue about "unexpected", eh.

So, actually, what is the code that you have there, to the compiler? Let's ask clang, which will show us if we ask nicely (invoking with -ast-dump -fsyntax-only):

ast.cpp:4:9: warning: multiple unsequenced modifications to 'i' [-Wunsequenced]
int x = ++i + ++i;
        ^     ~~
(some lines omitted)
`-CompoundStmt 0x2b3e628 <line:2:1, line:5:1>
  |-DeclStmt 0x2b3e4b8 <line:3:1, col:10>
  | `-VarDecl 0x2b3e430 <col:1, col:9> col:5 used i 'int' cinit
  |   `-IntegerLiteral 0x2b3e498 <col:9> 'int' 1
  `-DeclStmt 0x2b3e610 <line:4:1, col:18>
    `-VarDecl 0x2b3e4e8 <col:1, col:17> col:5 x 'int' cinit
      `-BinaryOperator 0x2b3e5f0 <col:9, col:17> 'int' '+'
        |-ImplicitCastExpr 0x2b3e5c0 <col:9, col:11> 'int' <LValueToRValue>
        | `-UnaryOperator 0x2b3e570 <col:9, col:11> 'int' lvalue prefix '++'
        |   `-DeclRefExpr 0x2b3e550 <col:11> 'int' lvalue Var 0x2b3e430 'i' 'int'
        `-ImplicitCastExpr 0x2b3e5d8 <col:15, col:17> 'int' <LValueToRValue>
          `-UnaryOperator 0x2b3e5a8 <col:15, col:17> 'int' lvalue prefix '++'
            `-DeclRefExpr 0x2b3e588 <col:17> 'int' lvalue Var 0x2b3e430 'i' 'int'

As you can see, the same lvalue Var 0x2b3e430 has prefix ++ applied at two locations, and these two are below the same node in the tree, which happens to be a very non-special operator (+) that has nothing special being said about sequencing or such. Why is this important? Well, read on.

Note the warning: "multiple unsequenced modifications to 'i'". Oh oh, that doesn't sound good. What does it mean? [basic.exec] tells us about side effects and sequencing, and it tells us (paragraph 10) that by default, unless explicitly said otherwise, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. Well, darn, that's the case with operator+ -- nothing is being said otherwise, so...

But do we care about sequenced-before, indeterminately-sequenced, or unsequenced? Who wants to know, anyway?

That same paragraph also tells us that unsequenced evaluations may overlap and that when they refer to the same memory location (that's the case!) and that one is not potentially concurrent, then the behavior is undefined. This is where it really gets ugly because that means you know nothing, and you have no guarantees about being "reasonable" whatsoever. The unreasonable thing is actually perfectly allowable and "reasonable".

Damon
  • 67,688
  • 20
  • 135
  • 185
  • The use of "reasonable" was just to preempt anyone from saying "the compiler could do ANYTHING, even emit the single instruction 'set x to 7.'" perhaps I should have clarified. – cinnamon Jun 05 '20 at 18:44
  • @cinnamon Many years ago, when I was young and unexperienced, compiler engineers at Sun told me that that their compiler acted absolutely reasonable producing code for undefined behaviour that I found unreasonable at the time. Lesson learned. – gnasher729 Jun 06 '20 at 19:08
  • The standard has nothing to say about object code. This is fragmented & unclear about how your suggested implementations are justified by or related to the language definition. – philipxy Jun 07 '20 at 08:53
  • @philipxy: The standard defines what is well-formed and well-defined and what is not. In the case of this Q, it defines the behavior as undefined. Beyond being legal, there is also the reasonable assumption of compilers generating efficient code. Yes, you are right, the standard does not require that to be the case. It is nevertheless a reasonable assumption. – Damon Jun 07 '20 at 16:25
  • @Damon: The Standard specifies what actions *all* implementations are required to treat as defined, and what actions implementations are not *required* to treat as defined. Because some tasks require a wider range of semantics than others, the failure of a Standard to define the behavior of some action doesn't mean that an implementation which defines it anyway won't be more suitable for some tasks than one that doesn't, nor that failure to define such behaviors won't render make an implementation less suitable for some purposes than others that do define it. – supercat Dec 31 '20 at 21:45
5

In practice, you are invoking undefined behavior. Anything can happen, not just things that you consider "reasonable", and often things do happen that you don't consider reasonable. Everything is by definition "reasonable".

A very reasonable compilation is that the compiler observes that executing a statement will invoke undefined behavior, therefore the statement cannot be executed, therefore it is translated to an instruction that intentionally crashes your application. That is very reasonable.

Downvoter: GCC strongly disagrees with you.

Aditya Joshi
  • 1,033
  • 7
  • 20
gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • When the Standard characterizes something as "undefined behavior", that means nothing more nor less than that the behavior is *outside the Standard's jurisdiction*. Because the Standard makes no attempt to judge the reasonableness of things outside its jurisdiction, and makes no attempt to forbid all the ways a conforming implementation might be unreasonably useless, the Standard's failure to impose requirements in a particular situation does not imply any judgment that all possible actions are equally "reasonable". – supercat Jun 07 '20 at 20:48
2

I appreciate that you wanted a specific answer. I appreciate that you did not want a reference to, or some restating of, that other question. But most of the time, when it comes to undefined behavior, it really doesn't matter how or why you got the particular result you got. Really. The only useful answer is, "Don't do that."

I understand the curiosity. When you get a surprising result, there's a natural tendency to want to understand how that particular result could have arisen. But these days, optimizing compilers are complicated enough that the particular result you get from a given instance of undefined behavior might as well be random, might as well be inexplicable, isn't really that interesting after all.

Suppose you drive your car down a road at high speed, and then close your eyes and remove your hands from the steering wheel while leaving the accelerator floored. An expected result is that you drift off to the right side of the road and crash. An expected result is that you drift off to the left side of the road and crash. An unexpected result is that your car somehow flies through the air and ends up lodged in a giant roadside donut.

That last result is certainly surprising. It's much more likely to make the nightly news. But what specific combination of factors allowed it to happen? Whatever it was, it was an extremely rare chance, dependent on near-random factors which would be almost impossible to pin down.

In extreme circumstances, it might make sense to try to disentangle the mechanisms behind a catastrophic accidental result. For example, 100 years ago, when railway trains collided, sometimes the passengers were injured but walked away, while sometimes large numbers of them died. It was eventually noticed that wooden cars disintegrated badly in a crash, and oil lamps tended to set the wreckage on fire. Knowing this, there was then an incentive to manufacture railway cars out of steel instead of wood, and to replace gas lights with electric ones, just to make things safer in a crash.

Along the same lines, if you're trying to exploit some undefined behavior in a program you're trying to subvert, or if you're trying to make code less vulnerable to exploitation, it might be useful to figure out exactly how, under which compiler and which other circumstances, ++i + ++i could turn 1 into 6. But if you're an ordinary C programmer, just trying to write code that works, don't worry about it.

As the old joke goes, "Doctor, doctor, it hurts when I do that!" "Well, don't do that, then." Don't write ++i + ++i in your C programs. If someone tells you that it might result, not only in an unexpected result of 2 or an unexpected result of 4, but possibly in an even-more-unexpected result of 6, that doesn't really change anything: you still don't want to write it. If you want to know exactly how you could have gotten 6, and you find the deliciously obscure reason, that doesn't change anything, either: you still don't want to write it. If you want to know exactly how you could have gotten 6, but the reason is so obscure that you're unable to discover it, that still doesn't change anything, because you still don't want to write it.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • The question asks what's "reasonable" without defining it. The asker balks at "anything is allowed" & presumably likely their "reasonable" is what current methods could or typically produce. But that is a tour of compilation theory far beyond the answer to a question on this site & entering the realm of "anything" anyway. Then their accepted answer is merely 1 example of reasonable otherwise unconnected to the gamut of "reasonable". Moreover its "because" begs the question since it merely appeals to other aspects of that 1 compiler that it doesn't connect to the range & nature of "reasonable". – philipxy Sep 05 '22 at 13:47
1

There is a rule:

Between the previous and next sequence point a scalar object must have its stored value modified at most once by the evaluation of an expression, otherwise the behavior is undefined.

Thus even x = 100 is a possible valid result.

For me the most logical result in the example is 6, because we are increasing the value of i twice and them add it to itself. It is difficult to do addition before the calculation values from both sides of "+".

But compiler developers can implement any other logic.

Slavenskij
  • 611
  • 6
  • 13
0

It looks like ++i returns an lvalue but i++ returns an rvalue.
So this code is ok:

int i = 1;
++i = 10;
cout << i << endl;

This one is not:

int i = 1;
i++ = 10;
cout << i << endl;

The above two statements are consistent with VisualC++, GCC7.1.1, CLang and Embarcadero.
That is why your code in VisualC++ and GCC7.1.1 is similar to following one

int i = 1;
... do something there for instance: ++i; ++i; ...
int x = i + i;

When looking at disassembly, it first increments i, rewrites i. When trying to add it does the same thing, increments i and rewrites it. Then adds i to i.
enter image description here
I've noticed CLang and Embarcadero acts differently. So it is not consistent with the first statement, after first ++i it stores the result in an rvalue and then add it to second i++.

armagedescu
  • 1,758
  • 2
  • 20
  • 31
  • The problem with "looks line an lvalue" is that you're talking from the perspective of the C++ standard, not a compiler. – MSalters Jun 04 '20 at 10:58
  • @MSalters The statement is consistent with VisualStudio 2019, GCC7.1.1, clang and Embarcadero, and with the first piece of code. So the specification is consistent. But it work differently for second piece of code. The second piece of code is consistent with VisualStudio 2019 and GCC7.1.1 but not consistent with clang and Embarcadero. – armagedescu Jun 04 '20 at 12:30
  • 3
    Well, the first piece of code in your answer is legal C++, so obviously the implementations are consistent with the specification.Compared to the question, your "do something" ends in a semicolon, making it a full statement. That creates a sequencing which is required by the C++ standard, but not present in the question. – MSalters Jun 04 '20 at 13:11
  • @MSalters I wanted to do it as a equivalent pseudo code. Yet I am not sure how to reformulate it – armagedescu Jun 04 '20 at 13:19
0

I personally would never have expected a compiler to output 6 in your example. There are already good and detailed answers to your question. I will try a short version.

Basically, ++i is a 2-step process in this context:

  1. Increment the value of i
  2. Read the value of i

In the context of ++i + ++i the two sides of the addition may be evaluated in any order according to the standard. This means the two increments are considered independent. Also, there is no dependency between the two terms. The increment and read of i may therefore be interleaved. This gives the potential order:

  1. Increment i for the left operand
  2. Increment i for the right operand
  3. Read back i for the left operand
  4. Read back i for the right operand
  5. Sum the two: yields 6

Now, that I think about this, 6 makes the most sense according to the standard. For a result of 4 we need a CPU which first reads i independently, then increments and writes the value back into the same location; basically a race condition. For a value of 5 we need a compiler which introduces temporaries.

But, the standard says that ++i increments the variable before returning it, i.e. before actually executing the current code line. The sum operator + needs to sum i + i after applying the increments. I would say that C++ needs to work on the variables and not on a value semantic. Hence, to me 6 makes now the most sense as it relies on semantics of the language and not the execution model of CPUs.

Simon
  • 31
  • 6
0
#include <stdio.h>


void a1(void)
{
    int i = 1;
    int x = ++i;
    printf("i=%d\n",i);
    printf("x=%d\n",x);
    x = x + ++i;    // Here
    printf("i=%d\n",i);
    printf("x=%d\n",x);
}


void b2(void)
{
    int i = 1;
    int x = ++i;
    printf("i=%d\n",i);
    printf("x=%d\n",x);
    x = i + ++i;    // Here
    printf("i=%d\n",i);
    printf("x=%d\n",x);
}


void main(void)
{
    a1();
    // b2();
}
John Linq
  • 1
  • 2
  • Welcome on stackoverflow! Can you provide any limitations, assumptions or simplifications in your answer. See more details on how to answer at this link: https://stackoverflow.com/help/how-to-answer – Usama Abdulrehman Jun 15 '20 at 02:32
0

well it depends on the design of the compiler.Therefore the answer will depend on the way the compiler decodes the statements.Using two different variables ++x and ++y instead to create a logic would be a better choice. note:the ouput depends on the version of latest version of language in ms visual studio if its updated.So if the rules have changed so will the output

sam
  • 1
  • 2
0

Try This

int i = 1;
int i1 = i, i2 = i;   // i1 = i2 = 1
++i1;                 // i1 = 2
++i2;                 // i2 = 2
int x = i1 + i2;      // x = 4
Ali Chraghi
  • 613
  • 6
  • 16
0

From this link order of evaluation :

Order of evaluation of the operands of any C operator, including the order of evaluation of function arguments in a function-call expression, and the order of evaluation of the subexpressions within any expression is unspecified (except where noted below). The compiler will evaluate them in any order, and may choose another order when the same expression is evaluated again.

From, the quotes, it is clear that, the order of evaluation is unspecified by C standards. Different compilers implement different orders of evalauation. The compiler is free to evaluate such expressions in any order. That's why different compilers give different output for expression mentioned in the question.

But, if a sequence point is present between the subexpressions Exp1 and Exp2, then both value computation and side effects of Exp1 are sequenced-before every value computation and side effect of Exp2.

Krishna Kanth Yenumula
  • 2,533
  • 2
  • 14
  • 26
  • What does this quote & your statement have to do with answering the question? (Rhetorical.) Anyway the given code has undefined behaviour as explained elsewhere here so your points are not applicable. Also the asker is trying to ask (poorly) about what aspects of their implementation lead to the sort of thing it does given that the code is undefined. Also they won't accept that without clarifying their question anything is "reasonable" for a compiler to do. Also this post adds nothing to posts already here. – philipxy Dec 23 '20 at 08:27
  • Your comment doesn't address any of the issues of mine. Including that the code is undefined & not unspecified. – philipxy Dec 23 '20 at 08:36
  • The quotes do not mention undefined, they mention unspecified. I'm done. – philipxy Dec 23 '20 at 08:42
  • Yes, it is unsepecified, That's why different compiler implment different orders of evaluation. That's why you will get different output for different compilers. – Krishna Kanth Yenumula Dec 23 '20 at 08:47
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226342/discussion-between-krishna-kanth-yenumula-and-philipxy). – Krishna Kanth Yenumula Dec 23 '20 at 08:53
  • OP question : why would different compilers compute different values. My answer: From quotes, the order of evaluation is unspecified by C standards. So, different compilers implement different orders of evaluations. Some compiler's implemnet right-lto-left order, and some compiler implement left-to-right order of evaluation. That's is the reason for different output. – Krishna Kanth Yenumula Dec 23 '20 at 09:01
-4

In practice, you are invoking undefined behaviour. Anything can happen, not just things that you consider "reasonable", and often things do happen that you don't consider reasonable. Everything is by definition "reasonable".

A very reasonable compilation is that the compiler observes that executing a statement will invoke undefined behaviour, therefore the statement cannot be ever executed, therefore it is translated to an instruction that intentionally crashes your application. That is very reasonable. After all, the compiler knows that this crash can never happen.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 1
    I believe you have misunderstood the question. The question is about general or specific behavior that could lead to specific outcomes (the outcomes of x = 4, 5, or 6). If you don't like my use of the word "reasonable", I direct you to my comment above, which you have replied to: "The use of 'reasonable' was just to preempt anyone from saying 'the compiler could do ANYTHING, even emit the single instruction '"set x to 7."'" If you have better wording for the question that retains the general idea, I'm open to it. Also, it appears you re-posted your answer. – cinnamon Jun 06 '20 at 19:27
  • 2
    suggest deleting one of your two answers since they are both very similar – M.M Jun 10 '20 at 05:25