Why is a = i + i++ undefined and not unspecified behaviour

Question

I read through several very good answers about undefined behaviour and sequence points (e.g. Undefined behavior and sequence points) and I understand, that

   int i = 1;
   a = i + i++; //this is undefined behaviour

is undefined code, according to the C++ standard. But what is the deeper reasoning behind it being undefined behaviour? Wouldn't it be enough to make it unspecified behaviour? The normal argument is, that by having few sequence points, C++ compilers can optimize better for different architectures, but wouldn't leaving it unspecified allow those optimizations as well? In

   a = foo(bar(1), bar(2)); //this is unspecified behaviour

the compiler can also optimize, and it is not undefined behaviour. In the first example it seems clear, that a is either 2 or 3, so the semantics seems to be clear to me. I hope there is a reasoning, why some things are unspecified, and others are undefined.

Refreshing to see a question on this topic that has clearly searched before hand :) — Flexo, Sep 24 '12 at 20:52
The real question is why on earth would you want to write that code? Given that it's insane to want to write that why add to the complexity of your standard by writing any rules about what may/may not happen? — Flexo, Sep 24 '12 at 20:54
Generally, the standard defines possible behavior for unspecified behavior. It would be impossible to specify all the possibilities for complex expressions. — Jesse Good, Sep 24 '12 at 21:32
For one thing, whatever the programmer meant by `a = i + i++;`, there is certainly a clearer way to express it, such as `a = i * 2; i++;`. It's not worth the effort (both by the authors of the language standard and by compiler implementers) to nail down the exact behavior of Bad Code. — Keith Thompson, Sep 24 '12 at 22:45

score 9 · Answer 1 · answered Sep 24 '12 at 20:47

Not all of those optimizations. Itanium, for example, could perform both the add and increment in parallel, and it might cough up, say, a hardware exception for trying to do something like this.

But this is a completely micro-optimization, writing the compiler to take advantage of that was extremely difficult, and it's an extremely rare architecture that can do it (none existed at the time, IIRC, it was mostly hypothetical). So the reality is that as of 2012, there is no reason for it not to be well-defined behaviour, and indeed, C++11 made more of these situations well-defined.

score 3 · Accepted Answer · answered Sep 24 '12 at 21:53

From the viewpoint of C++, I think the answer is incredibly simple: it was made undefined behavior because C had made it undefined behavior long before, and there was essentially no potential gain from changing that.

That points to what I'd guess was really more the intended question: why did C make this undefined behavior?

I don't think that has quite as simple of an answer. One possibility is simple caution -- knowledge that by the time the C standard was being written, C had already been implemented, deployed and used on lots of machines. A fair number of machines back then seemed like a lot of code I still see: something originally designed only as a personal experiment, that worked well enough that it ended up designated as "production", without even a token attempt at fixing anything by the most egregious problems. As such, even if nobody knew of hardware this would break, nobody could be really sure such hardware didn't exist either, so it was safest to just call it UB, and be done with it.

Another possibility is that it went a bit beyond simple caution. Even though we can feel fairly safe with modern hardware, there may have been hardware at the time that people really knew would have major problems with this, and (especially if vendors associated with that hardware were represented on the committee) allowing C to run on that hardware was considered important.

Yet another possibility would be that even though nobody knew of (or even feared the possibility of) some existing implementation that this could break, they foresaw the future possibility of something it would break, so undefined behavior was seen as a way of future proofing the language to at least some limited degree.

A final possibility is that whoever was writing that part of the standard moved on to other things as soon as they came up with a set of rules that seemed acceptable, even though they could have come up with other rules that at least some might have liked better.

If I had to guess, I'd say it was probably a combination of the third and fourth possibilities I've given -- the committee was aware of developments in parallel computing without knowing how it would work out in the end, so for whomever wrote this, maximizing latitude on the part of the implementation seemed like the easiest/simplest route to gaining consensus so they could finish it and move on to bigger and better things.

score 2 · Answer 3 · answered Sep 24 '12 at 20:59

2

There's a huge difference between undefined behavior and unspecified behavior. Unspecified behavior is well-formed (i.e., legal) but the standard leaves the compiler vendor some latitude as to implementation. Undefined behavior is an atrocity that appears to be syntactically correct. The primary reason for deeming behavior to be "undefined" rather than flat-out illegal (something the compiler must reject) is that sometimes that undefined behavior can be very hard to diagnose.

answered Sep 24 '12 at 20:59

David Hammen

32,454
9
60
108

I know the difference between undefined and unspecified behaviour, I was just wondering why the a=i++ +i; violation could not have been placed in the unspecified category. – Thomas M. Sep 24 '12 at 21:03
@ThomasM. - The only difference between `i++ + i` and `i++ + function_whose_output_depends_critically_on_i(i)` is that you can pretend that the former might make sense; the latter is nonsense no matter how you look at it. They are the same problem, they are both nonsense. – David Hammen Sep 24 '12 at 21:11

score 0 · Answer 4 · edited Sep 24 '12 at 22:46

a = foo(bar(1), bar(2)); // this is unspecified behaviour

The two function calls can be made in any order, but they remain two different function calls. The machine instructions are not allowed to overlap even if the calls are inlined. In reality, they do overlap quite a bit, but the optimizer is restricted to produce code that behaves strictly as-if the function calls were separate.

a = i + i++; // this is undefined behaviour

With a scalar i, there is no requirement for separation: the cpu instructions that fetch from i, that add, and that postincrement, mix freely, and the optimizer is allowed to pretend it doesn't know that i on the left and i on the right are the same thing. There's no telling what kind of broken assembly it can produce when this precondition is violated. Thus, undefined.

score 0 · Answer 5 · answered Sep 25 '12 at 08:44

MIPS 1 was a reasonable implementation with Load Delay slots. Executing a load would not be instant. The result was only visible after the next instruction was started. For compilers, this was no big deal. Just put an unrelated instruction in the next slot.

Of course, the compiler had to know what "unrelated" was. With the C rule against concurrent modifications of a single varibale, the compiler had far more choice in finding an instruction that had to be unrelated. If two operations appeared in a single statement, they have to operate on different variables and therefore be unrelated.

Why is a = i + i++ undefined and not unspecified behaviour

5 Answers5

Linked