At what point in the loop does integer overflow become undefined behavior?

Question

This is an example to illustrate my question which involves some much more complicated code that I can't post here.

#include <stdio.h>
int main()
{
    int a = 0;
    for (int i = 0; i < 3; i++)
    {
        printf("Hello\n");
        a = a + 1000000000;
    }
}

This program contains undefined behavior on my platform because a will overflow on the 3rd loop.

Does that make the whole program have undefined behavior, or only after the overflow actually happens? Could the compiler potentially work out that a will overflow so it can declare the whole loop undefined and not bother to run the printfs even though they all happen before the overflow?

_{(Tagged C and C++ even though are different because I'd be interested in answers for both languages if they are different.)}

Wonder if the compiler could work out that `a` isn't used (except for calculating itself) and simply remove `a` — Support Ukraine, Oct 07 '16 at 10:31
You might enjoy [My Little Optimizer: Undefined Behaviour is Magic](https://www.youtube.com/watch?v=g7entxbQOCc) from CppCon this year. It's all about what optimizations compilers can carry out based on undefined behaviour. — TartanLlama, Oct 07 '16 at 10:39
http://stackoverflow.com/questions/6664471/observable-behaviour-and-compiler-freedom-to-eliminate-transform-pieces-c-co/6665635#6665635 — Bo Persson, Oct 07 '16 at 11:42
See also https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633 — Eric Lippert, Oct 07 '16 at 15:59
["A sufficiently advanced compiler is indistinguishable from an adversary."](http://blog.regehr.org/archives/970) — Kyle Strand, Oct 07 '16 at 18:33
This is almost a duplicate: https://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea — usr, Oct 07 '16 at 21:39
[This question](http://stackoverflow.com/questions/27563431/can-guaranteed-ub-be-rejected-at-compile-time) covers the same issues (in C) — M.M, Oct 07 '16 at 23:54
A common example of UB being rejected at compile-time is the case of calling a function which was declared but not defined — M.M, Oct 07 '16 at 23:55
Undefined behaviour means it's undefined. Compiler is free to do anything. For example, it can invert its understanding of standard and treat each defined behaviour as undefined, and vice versa. — MatthewRock, Oct 08 '16 at 21:13
@jcoder: I believe the answer is wrong, but a common misconception. Take a look at mine so you see why. — user541686, Oct 09 '16 at 05:25
Of course, the UB is only exhibited if int is smaller than signed 32-bit... — Andrew, Oct 15 '16 at 06:13
If you pass a null pointer into a func that can't receive a null eg memcpy, the compilier can remove all null checks on that pointer ANYWHERE, until it's reassigned because have you told it that it can't be null. The best answer to this question is "Don't use UB anywhere" because the behaviour can be both unexpected and random. — , Oct 22 '16 at 14:51
use [-fwrapv](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Code-Gen-Options.html) with gcc and clang, by the way. Not sure why this is still handled as undefined behavior by the C standard. It's 2016. All of today's devices use 2's complement arithmetic. — Jason S, Oct 22 '16 at 23:27
I added a [answer that also covers C](https://stackoverflow.com/a/52262732/1708801) since no one really did this is tagged with both. — Shafik Yaghmour, Sep 10 '18 at 19:11

TartanLlama · Accepted Answer · 2016-10-07T10:33:55.133

108

If you're interested in a purely theoretical answer, the C++ standard allows undefined behaviour to "time travel":

[intro.execution]/5: A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)

As such, if your program contains undefined behaviour, then the behaviour of your whole program is undefined.

edited Oct 07 '16 at 10:33

answered Oct 07 '16 at 10:32

TartanLlama

63,752
13
157
193

4

@KeithThompson: But then, the `sneeze()` function itself is undefined on anything of the class `Demon` (of which the nasal variety is a subclass), making the whole thing circular anyways. – Sebastian Lenartowicz Oct 07 '16 at 21:25
1

But printf might not return, so the first two rounds are defined because until the're done it is not clear there there ever will be UB. See https://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea – usr Oct 07 '16 at 21:32
1

This is why a compiler is technically within its rights to emit "nop" for the Linux kernel (because the bootstrap code relies on undefined behavior): http://blog.regehr.org/archives/761 – Crashworks Oct 07 '16 at 21:59
3

@Crashworks And that is why Linux is written in, and compiled as, *unportable* C. (i.e. a superset of C that requires a particular compiler with particular opitions, such as -fno-strict-aliasing) – user253751 Oct 08 '16 at 07:16
3

@usr I expect it's defined if `printf` does not return, but if `printf` is going to return, then the undefined behaviour can cause problems before `printf` is called. Hence, time travel. `printf("Hello\n");` and then the next line compiles as `undoPrintf(); launchNuclearMissiles();` – user253751 Oct 08 '16 at 07:17
@TartanLlama: Do you know of a *single* compiler that would e.g. optimize out statements that have side effects and which precede unconditional undefined behavior? – user541686 Oct 10 '16 at 20:47
@Mehrdad None that I can think of, hence the "purely theoretical" part of my answer. I know that LLVM's CFG simplification just returns if anything could affect the unreachability of a statement with UB. See [here](https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/SimplifyCFG.cpp#L4006) if you're interested. – TartanLlama Oct 11 '16 at 09:54
@TartanLlama: So here's the crucial question. If none of them do such a thing, *why* don't they? Is your best justification that they just haven't gotten around to implementing it? Because, to me, this suggests that they may not be interpreting the standard the same way people here are at all. – user541686 Oct 11 '16 at 10:02
@Mehrdad Probably because it's a lot of effort to work out if a given function could affect the reachability of some UB and it likely doesn't buy you a lot of performance to do so. Deleting the rest of a basic block after some unconditional UB is easy, doing it beforehand is less so. – TartanLlama Oct 11 '16 at 10:43
As the text says, it even renders the whole program behavior undefined if just an unspecific behavior could possibly yield to undefined behavior, even if the implementation would never choose/exhibit such unspecific behavior. – Johannes Schaub - litb Oct 21 '16 at 21:43
@Mehrdad: On many implementations, certain library functions or volatile accesses may cause signals (or something else like them) to be raised, such that code would never reach succeeding statements. If an implementation specifies that pressing control-C while code is blocked at a getch() will raise a SIGINT, for example, then behavior should be defined if code after the getch() would invoke UB, but a SIGINT prevents execution from getting there. – supercat Oct 24 '16 at 19:08
@Mehrdad: Unless the entity that controls a compiler also controls everything else about the execution environment, there may be no way for the compiler to know all of the cases where a system call or volatile access might wrest control away from the caller and not return it; if an execution environment specifies that control-C will prevent getch() from ever returning, it would be rather rude of a compiler to behave as though was going to return even in cases where the operator would have typed control-C. – supercat Oct 24 '16 at 19:12
@supercat: *"Unless the entity that controls a compiler also controls everything else about the execution environment, there may be no way for the compiler to know all of the cases where a system call or volatile access might wrest control away from the caller and not return it."* **Excellent!** Now would you kindly take a few minutes to read my (downvoted-to-hell) answer below and let me know if you agree with it? It seems that you are saying exactly the same thing I have been saying, yet no one seems to be believing me... – user541686 Oct 24 '16 at 19:28
@Mehrdad: I think a fundamental problem is that C has split into two languages--one of which is suitable for systems programming and one of which eagerly and needlessly jettisons features necessary for systems programming in an effort to make the language more suitable for high-end computing [I say needless because the level of performance that could be achieved by adding directives to waive guarantees that a particular program doesn't need would exceed anything that could be achieved by a conforming compiler in the absence of such directives, but adding support for such directives... – supercat Oct 24 '16 at 19:53
...would not require breaking any existing code]. The zeal with which some compiler writers pursue UB-based optimizations had led to an apparent lack of interest in other kinds of optimizations which could be easier, safer, and more effective. – supercat Oct 24 '16 at 19:54
I added a answer [which covers C more specifically](https://stackoverflow.com/a/52262732/1708801) because a duplicate was for C but no answers really covered C properly. – Shafik Yaghmour Sep 10 '18 at 23:16

Matthieu M. · Answer 2 · 2016-10-09T10:43:46.900

32

First, let me correct the title of this question:

Undefined Behavior is not (specifically) of the realm of execution.

Undefined Behavior affects all steps: compiling, linking, loading and executing.

Some examples to cement this, bear in mind that no section is exhaustive:

the compiler can assume that portions of code that contain Undefined Behavior are never executed, and thus assume the execution paths that would lead to them are dead code. See What every C programmer should know about undefined behavior by none other than Chris Lattner.
the linker can assume that in the presence of multiple definitions of a weak symbol (recognized by name), all definitions are identical thanks to the One Definition Rule
the loader (in case you use dynamic libraries) can assume the same, thus picking the first symbol it finds; this is usually (ab)used for intercepting calls using LD_PRELOAD tricks on Unixes
the execution might fail (SIGSEV) should you use dangling pointers

This is what is so scary about Undefined Behavior: it is nigh impossible to predict, ahead of time, what exact behavior will occur, and this prediction has to be revisited at each update of the toolchain, underlying OS, ...

I recommend watching this video by Michael Spencer (LLVM Developer): CppCon 2016: My Little Optimizer: Undefined Behavior is Magic.

edited Oct 09 '16 at 10:43

answered Oct 07 '16 at 13:44

Matthieu M.

287,565
48
449
722

3

This is what worries me. In my real code, it's complex but I *might* have a case where it will always overflow. And I don't really care about that, but i worry that "correct" code will also be affected by this. Obviously I need to fix it, but fixing requires understanding :) – jcoder Oct 07 '16 at 13:56
8

@jcoder: There's one important escape here. The compiler isn't allowed to guess at input data. As long as there is at least one input for which Undefined Behavior does not occur, the compiler must ensure that this particular input still produces the right output. All the scary talk about dangerous optimizations only apply to _unavoidable_ UB. Practically speaking, if you would have used `argc` as the loop count, the case `argc=1` does not produce UB and the compiler would be forced to handle that. – MSalters Oct 07 '16 at 14:25
@jcoder: In this case, this is not dead code. The compiler, however, could be smart enough to deduce that `i` cannot be incremented more than `N` times and therefore that its value is bounded. – Matthieu M. Oct 07 '16 at 14:36
4

@jcoder: If `f(good);` does some thing X and `f(bad);` invokes undefined behavior, then a program that just invokes `f(good);` is guaranteed to do X, but `f(good); f(bad);` is not guaranteed to do X. – Oct 08 '16 at 07:50
@jcoder: I added a link to a recent video from a LLVM developer talking about undefined behavior. It gives some more examples. – Matthieu M. Oct 08 '16 at 10:49
4

@Hurkyl more interestingly, if your code is `if(foo) f(good); else f(bad);`, a smart compiler will throw away the comparison and produce and an unconditional `foo(good)`. – John Dvorak Oct 09 '16 at 12:30
@MSalters "As long as there is at least one input for which Undefined Behavior does not occur, the compiler must ensure that this particular input still produces the right output" Where can I read up on this? – phant0m Dec 20 '17 at 23:45

Bathsheba · Answer 3 · 2016-10-07T10:34:23.850

28

An aggressively optimising C or C++ compiler targeting a 16 bit int will know that the behaviour on adding 1000000000 to an int type is undefined.

It is permitted by either standard to do anything it wants which could include the deletion of the entire program, leaving int main(){}.

But what about larger ints? I don't know of a compiler that does this yet (and I'm not an expert in C and C++ compiler design by any means), but I imagine that sometime a compiler targeting a 32 bit int or higher will figure out that the loop is infinite (i doesn't change) and so a will eventually overflow. So once again, it can optimise the output to int main(){}. The point I'm trying to make here is that as compiler optimisations become progressively more aggressive, more and more undefined behaviour constructs are manifesting themselves in unexpected ways.

The fact that your loop is infinite is not in itself undefined since you are writing to standard output in the loop body.

edited Oct 07 '16 at 10:34

answered Oct 07 '16 at 10:16

Bathsheba

231,907
34
361
483

3

Is it permitted by the standard to do anything it wants even before the undefined behaviour manifests? Where is this stated? – jimifiki Oct 07 '16 at 10:20
4

why 16 bit? I guess OP is looking for a 32 bit signed overflow. – Support Ukraine Oct 07 '16 at 10:24
@4386427: could have sworn the question mentioned a 16 bit int. But the fact that it doesn't makes the answer more interesting. – Bathsheba Oct 07 '16 at 10:27
8

@jimifiki In the standard. C++14 (N4140) 1.3.24 "udnefined behaviour = behavior for which this International Standard imposes no requirements." Plus a lengthy note which elaborates. But the point is that it's not the behaviour of a "statement" that is undefined, it's the behaviour of the *program.* That means that as long as UB is triggered by a rule in the standard (or by absence of a rule), the standard ceases to apply for the program *as a whole.* So *any* part of the program can behave however it wants. – Angew is no longer proud of SO Oct 07 '16 at 10:46
I intended my loop to terminate but it's interesting either way :) – jcoder Oct 07 '16 at 11:01
5

The first statement is wrong. If `int` is 16-bit, the addition will take place in `long` (because the literal operand has type `long`) where it's well-defined, then be converted by an implementation-defined conversion back to `int`. – R.. GitHub STOP HELPING ICE Oct 07 '16 at 14:42
printf might not return, so the first two rounds are defined because until the're done it is not clear there there ever will be UB. See https://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea – usr Oct 07 '16 at 21:35
2

@usr the behaviour of `printf` is defined by the standard to always return – M.M Oct 09 '16 at 01:37
@usr Since `printf` will in fact return, there's no reason to think the compiler couldn't know that. And if the compiler knows `printf` will return, it knows there will be UB. That permits it to assume the code will never execute. So it could remove the loop. (Among other things.) – David Schwartz Oct 09 '16 at 12:15
I agree. I did not know printf is specified to return. Maybe its more interesting to consider an arbitrary function that might not return. That's more real-world. – usr Oct 09 '16 at 12:25

score 12 · Answer 4 · edited May 23 '17 at 12:17

Technically, under the C++ standard, if a program contains undefined behavior, the behavior of the entire program, even at compile time (before the program is even executed), is undefined.

In practice, because the compiler may assume (as part of an optimization) that the overflow will not occur, at least the behavior of the program on the third iteration of the loop (assuming a 32-bit machine) will be undefined, though it is likely that you will get correct results before the third iteration. However, since the behavior of the entire program is technically undefined, there's nothing stopping the program from generating completely incorrect output (including no output), crashing at runtime at any point during execution, or even failing to compile altogether (as undefined behavior extends to compile time).

Undefined behavior provides the compiler with more room to optimize because they eliminate certain assumptions about what the code must do. In doing so, programs that rely on assumptions involving undefined behavior are not guaranteed to work as expected. As such, you should not rely on any particular behavior that is considered undefined per the C++ standard.

What if the UB part is within an `if(false) {}` scope? Does that poison the entire program, due to compiler assuming all branches contain ~well-defined portions of logic, and thus operating on wrong assumptions? — mlvljr, Oct 30 '16 at 20:40
The standard imposes no requirements whatsoever on undefined behavior, so *in theory*, yes, it does poison the whole program. However, *in practice*, any optimizing compiler will likely just remove the dead code, so it probably would have no effect on execution. You still shouldn't rely on this behavior, though. — bwDraco, Oct 30 '16 at 20:44

score 10 · Answer 5 · edited May 23 '17 at 12:10

To understand why undefined behavior can 'time travel' as @TartanLlama adequately put it, let's take a look at the 'as-if' rule:

1.9 Program execution

¹ The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

With this, we could view the program as a 'black box' with an input and an output. The input could be user-input, files, and many other things. The output is the 'observable behavior' mentioned in the standard.

The standard only defines a mapping between the input and the output, nothing else. It does this by describing an 'example black box', but explicitly says any other black box with the same mapping is equally valid. This means the content of the black box is irrelevant.

With this in mind, it would not make sense to say that undefined behavior occurs at a certain moment. In the sample implementation of the black box, we could say where and when it happens, but the actual black box could be something completely different, so we can't say where and when it happens anymore. Theoretically, a compiler could for example decide to enumerate all the possible inputs, and pre-compute the resulting outputs. Then the undefined behavior would have happened during compilation.

Undefined behavior is the inexistence of a mapping between input and output. A program can have undefined behavior for some input, but defined behavior for other. Then the mapping between input and output is simply incomplete; there is input for which no mapping to output exists.
The program in the question has undefined behavior for any input, so the mapping is empty.

score 6 · Answer 6 · answered Oct 07 '16 at 14:48

Assuming int is 32-bit, undefined behavior happens at the third iteration. So if, for example, the loop was only conditionally reachable, or could conditionally be terminated before the third iteration, there would be no undefined behavior unless the third iteration is actually reached. However, in the event of undefined behavior, all output of the program is undefined, including output which is "in the past" relative to the invocation of undefined behavior. For example, in your case, this means there is no guarantee of seeing 3 "Hello" messages in the output.

score 6 · Answer 7 · answered Oct 07 '16 at 21:04

TartanLlama's answer is correct. The undefined behavior can happen at any time, even during compile time. This may seem absurd, but it's a key feature to permit compilers to do what they need to do. It's not always easy to be a compiler. You have to do exactly what the spec says, every time. However, sometimes it can be monstrously difficult to prove that a particular behavior is occurring. If you remember the halting problem, its rather trivial to develop software for which you cannot prove whether it completes or enters an infinite loop when fed a particular input.

We could make compilers be pessimistic, and constantly compile in fear that the next instruction might be one of these halting problem like issues, but that isn't reasonable. Instead we give the compiler a pass: on these "undefined behavior" topics, they are freed from any responsibility. Undefined behavior consists of all of the behaviors which are so subtly nefarious that we have trouble separating them from the really-nasty-nefarious halting problems and whatnot.

There is an example which I love to post, though I admit I lost the source to, so I have to paraphrase. It was from a particular version of MySQL. In MySQL, they had a circular buffer which was filled with user-provided data. They, of course, wanted to make sure that the data didn't overflow the buffer, so they had a check:

if (currentPtr + numberOfNewChars > endOfBufferPtr) { doOverflowLogic(); }

It looks sane enough. However, what if numberOfNewChars is really big, and overflows? Then it wraps around and becomes a pointer smaller than endOfBufferPtr, so the overflow logic would never get called. So they added a second check, before that one:

if (currentPtr + numberOfNewChars < currentPtr) { detectWrapAround(); }

It looks like you took care of the buffer overflow error, right? However, a bug was submitted stating that this buffer overflowed on a particular version of Debian! Careful investigation showed that this version of Debian was the first to use a particularly bleeding-edge version of gcc. On this version of gcc, the compiler recognized that currentPtr + numberOfNewChars can never be a smaller pointer than currentPtr because overflow for pointers is undefined behavior! That was sufficient for gcc to optimize out the entire check, and suddenly you were not protected against buffer overflows even though you wrote the code to check it!

This was spec behavior. Everything was legal (though from what I heard, gcc rolled back this change in the next version). It's not what I would consider intuitive behavior, but if you stretch your imagination a bit, it's easy to see how a slight variant of this situation could become a halting problem for the compiler. Because of this, the spec writers made it "Undefined Behavior" and stated that the compiler could do absolutely anything it pleased.

I don't consider especially astonishing compilers which sometimes behave as though signed arithmetic is performed on types whose range extends beyond "int", especially considering that even when doing straightforward code generation on x86 there are times when doing so is more efficient than truncating intermediate results. What's more astonishing is when overflow affects *other* calculations, which can happen in gcc even if code stores the product of two uint16_t values into a uint32_t--an operation which should have no plausible reason to act surprising in a non-sanitizing build. — supercat, Oct 07 '16 at 21:59
Of course, the correct check would be `if(numberOfNewChars > endOfBufferPtr - currentPtr)`, provided numberOfNewChars can never be negative and currentPtr always points to somewhere within the buffer you don't even need the ridiculous "wraparound" check. (I don't think the code you provided has any hope of working in a circular buffer - you've left out whatever is necessary for that in the paraphrase, so I'm ignoring that case as well) — Random832, Oct 08 '16 at 06:47
@Random832 I did leave out a ton. I've tried to quote the larger context, but since I lost my source, I've found paraphrasing the context got me in more trouble so I leave it out. I really need to find that blasted bug report so I can quote it properly. It really is a powerful example of how you can think you wrote code one way, and have it compile completely differently. — Cort Ammon, Oct 08 '16 at 14:54
This is my biggest issue with undefined behaviour. It makes it sometimes impossible to write correct code, and when the compiler detects it, by default does not tell you it's triggered undefined behaviour. In this case the user simply wants to do arithmetic - pointer or not - and all their hard work to wrote secure code was undone. There should at least be a way to annotate a section of code to say - no fancy optimizations here. C/C++ is used in too many critical areas to allow this hazardous situation to continue in favour of optimzation — John McGrath, Oct 22 '16 at 17:22

Steve Summit · Answer 8 · 2016-10-21T21:25:29.560

Undefined behavior is, by definition, a grey area. You simply can't predict what it will or won't do -- that's what "undefined behavior" means.

Since time immemorial, programmers have always tried to salvage remnants of definedness from an undefined situation. They've got some code they really want to use, but which turns out to be undefined, so they try to argue: "I know it's undefined, but surely it will, at worst, do this or this; it will never do that." And sometimes these arguments are more or less right -- but often, they're wrong. And as the compilers get smarter and smarter (or, some people might say, sneakier and sneakier), the boundaries of the question keep changing.

So really, if you want to write code that's guaranteed to work, and that will keep working for a long time, there's only one choice: avoid ye the undefined behavior at all costs. Verily, if you dabble in it, it will come back to haunt you.

and yet, here's the thing... compilers can use undefined behavior to optimize but THEY GENERALLY DON'T TELL YOU. So if we have this awesome tool that you have to avoid doing X at all costs, why can't the compiler give you a warning so you can fix it? — Jason S, Oct 22 '16 at 23:32

score 4 · Answer 9 · answered Oct 10 '16 at 15:27

Beyond the theoretical answers, a practical observation would be that for a long time compilers have applied various transforms upon loops to reduce the amount of work done within them. For example, given:

for (int i=0; i<n; i++)
  foo[i] = i*scale;

a compiler might transform that into:

int temp = 0;
for (int i=0; i<n; i++)
{
  foo[i] = temp;
  temp+=scale;
}

Thus saving a multiplication with every loop iteration. An additional form of optimization, which compilers adapted with varying degrees of aggressiveness, would turn that into:

if (n > 0)
{
  int temp1 = n*scale;
  int *temp2 = foo;
  do
  {
    temp1 -= scale;
    *temp2++ = temp1;
  } while(temp1);
}

Even on machines with silent wraparound on overflow, that could malfunction if there was some number less than n which, when multiplied by scale, would yield 0. It could also turn into an endless loop if scale was read from memory more than once and something changed its value unexpectedly (in any case where "scale" could change mid-loop without invoking UB, a compiler would not be allowed to perform the optimization).

While most such optimizations would not have any trouble in cases where two short unsigned types are multiplied to yield a value which is between INT_MAX+1 and UINT_MAX, gcc has some cases where such a multiplication within a loop may cause the loop to early-exit. I haven't noticed such behaviors stemming from comparison instructions in generated code, but it is observable in cases where the compiler uses the overflow to infer that a loop can execute at most 4 or fewer times; it does not by default generate warnings in cases where some inputs would cause UB and others would not, even if its inferences cause the upper bound of the loop to be ignored.

score 1 · Answer 10 · answered Oct 07 '16 at 16:40

1

One thing your example doesn't consider is optimisation. a is set in the loop but never used, and an optimiser could work this out. As such, it is legitimate for the optimiser to discard a completely, and in that case all undefined behaviour vanishes like a boojum's victim.

However of course this itself is undefined, because optimisation is undefined. :)

answered Oct 07 '16 at 16:40

Graham

1,655
9
19

1

There's no reason to consider optimization when determining whether the behavior is undefined. – Keith Thompson Oct 07 '16 at 19:03
2

The fact that the program behaves as one might assume it should does not mean the undefined behaviour "vanishes". The behaviour is still undefined and you are simply relying on luck. The very fact that the program's behaviour can change based on compiler options is a strong indicator that the behaviour is undefined. – Jordan Melo Oct 07 '16 at 19:12
@JordanMelo Since many of the previous answers discussed optimisation (and the OP specifically asked about that), I've mentioned a feature of optimisation which no previous answer had covered. I also pointed out that even though optimisation *could* remove it, reliance on optimisation to work in any particular way is again undefined. I'm certainly not recommending it! :) – Graham Oct 10 '16 at 11:33
@KeithThompson Sure, but the OP specifically asked about optimisation and its effect on the undefined behaviour he would see on his platform. That specific behaviour could vanish, depending on optimisation. As I said in my answer though, the undefinedness would not. – Graham Oct 10 '16 at 11:36

score 0 · Answer 11 · edited Sep 10 '18 at 19:21

Since this question is dual tagged C and C++ I will try and address both. C and C++ take different approaches here.

In C the implementation must be able to prove the undefined behavior will be invoked in order to treat the whole program as-if it had undefined behavior. In the OPs example it would seem trivial for the compiler to prove that and therefore it is as-if the whole program was undefined.

We can see this from Defect Report 109 which at its crux asks:

If however the C Standard recognizes the separate existence of "undefined values" (whose mere creation does not involve wholly "undefined behavior") then a person doing compiler testing could write a test case such as the following, and he/she could also expect (or possibly demand) that a conforming implementation should, at the very least, compile this code (and possibly also allow it to execute) without "failure."
int array1[5];
int array2[5];
int *p1 = &array1[0];
int *p2 = &array2[0];

int foo()
{
int i;
i = (p1 > p2); /* Must this be "successfully translated"? */
1/0; /* Must this be "successfully translated"? */
return 0;
}
So the bottom line question is this: Must the above code be "successfully translated" (whatever that means)? (See the footnote attached to subclause 5.1.1.3.)

and the response was:

The C Standard uses the term "indeterminately valued" not "undefined value." Use of an indeterminate valued object results in undefined behavior. The footnote to subclause 5.1.1.3 points out that an implementation is free to produce any number of diagnostics as long as a valid program is still correctly translated. If an expression whose evaulation would result in undefined behavior appears in a context where a constant expression is required, the containing program is not strictly conforming. Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming. A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

In C++ the approach seems more relaxed and would suggest a program has undefined behavior regardless of whether the implementation can prove it statically or not.

We have [intro.abstrac]p5 which says:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

The fact that executing a function would invoke UB can only affect the way a program behaves when given some particular input if at least one possible execution of the program when given that input would invoke UB. The fact that invoking a function would invoke UB does not prevent a program from having defined behavior when it is fed input that would not allow the function to be invoked. — supercat, Sep 10 '18 at 21:08
@supercat I believe that is what my answer us saying for C at least. — Shafik Yaghmour, Sep 10 '18 at 23:15
I think the same applies for the quoted text re C++, since the phrase "Any such execution" refers to ways the program could execute with a particular given input. If a particular input couldn't result in a function executing, I see nothing in the quoted text to say that anything in such a function would result in UB. — supercat, Sep 11 '18 at 05:01

user541686 · Answer 12 · 2016-10-09T05:39:23.887

The top answer is a wrong (but common) misconception:

Undefined behavior is a run-time property*. It CANNOT "time-travel"!

Certain operations are defined (by the standard) to have side-effects and cannot be optimized away. Operations that do I/O or that access volatile variables fall in this category.

However, there is a caveat: UB can be any behavior, including behavior that undoes previous operations. This can have similar consequences, in some cases, to optimizing out earlier code.

In fact, this is consistent with the quote in the top answer (emphasis mine):

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.
However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Yes, this quote does say "not even with regard to operations preceding the first undefined operation", but notice that this is specifically about code that is being executed, not merely compiled.
After all, undefined behavior that isn't actually reached doesn't do anything, and for the line containing UB to be actually reached, code that precedes it must execute first!

So yes, once UB is executed, any effects of previous operations become undefined. But until that happens, the execution of the program is well-defined.

Note, however, that all executions of the program that result in this happening can be optimized to equivalent programs, including any that perform previous operations but then un-do their effects. Consequently, preceding code may be optimized away whenever doing so would be equivalent to their effects being undone; otherwise, it can't. See below for an example.

*Note: This is not inconsistent with UB occurring at compile time. If the compiler can indeed prove that UB code will always be executed for all inputs, then UB can extend to compile time. However, this requires knowing that all previous code eventually returns, which is a strong requirement. Again, see below for an example/explanation.

To make this concrete, note that the following code must print foo and wait for your input regardless of any undefined behavior that follows it:

printf("foo");
getchar();
*(char*)1 = 1;

However, also note that there is no guarantee that foo will remain on the screen after the UB occurs, or that the character you typed will no longer be in the input buffer; both of these operations can be "undone", which has a similar effect to UB "time-travel".

If the getchar() line wasn't there, it would be legal for the lines to be optimized away if and only if that would be indistinguishable from outputting foo and then "un-doing" it.

Whether or not the two would be indistinguishable would depend entirely on the implementation (i.e. on your compiler and standard library). For example, can your printf block your thread here while waiting for another program to read the output? Or will it return immediately?

If it can block here, then another program can refuse to read its full output, and it may never return, and consequently UB may never actually occur.
If it can return immediately here, then we know it must return, and therefore optimizing it out is entirely indistinguishable from executing it and then un-doing its effects.

Of course, since the compiler knows what behavior is permissible for its particular version of printf, it can optimize accordingly, and consequently printf may get optimized out in some cases and not others. But, again, the justification is that this would be indistinguishable from the UB un-doing previous operations, not that the previous code is "poisoned" because of UB.

You are totally misreading the standard. It says the behavior when executing *the program* is undefined. Period. This answer is 100% wrong. The standard is very clear -- running a program with input that produces UB at any point in the naive flow of execution is undefined. — David Schwartz, Oct 09 '16 at 10:37
@DavidSchwartz: If you follow your interpretation to its logical conclusions you should realize it doesn't make logical sense. The input is *not* something that is fully determined when the program starts. The input to the program (even its mere *presence*) at any given line is allowed to depend on *all* of the side effects of the program until that line. Therefore, the program **cannot** avoid producing the side effects that come before the UB line, because that requires interaction with its environment and therefore affects whether the UB line will be reached or not in the first place. — user541686, Oct 09 '16 at 10:55
That's a non-sequiter. We are talking about what the standard guarantees, not what won't break because you can't think of any way it could break. (And I can think of ways it can break. You are just less imaginative than I am.) — David Schwartz, Oct 09 '16 at 10:59
@DavidSchwartz: It's not though. The program has no way of knowing whether the input is known ahead of time or whether the input is based on the program's preceding outputs. The input is beyond the program's control. Since in the latter case it is impossible for the program to avoid code that precedes UB, the same must hold true in the former case. — user541686, Oct 09 '16 at 11:02
That doesn't matter. Really. Again, you just lack imagination. For example, if the compiler can tell that no compliant code could tell the difference, it could move the code that's UB such that the part that's UB executes prior to the outputs that you naively expect to be "preceding". — David Schwartz, Oct 09 '16 at 11:03
@DavidSchwartz: What do you mean it doesn't matter? All you're doing is just asserting that I'm wrong with no logical reasoning behind your position. I'm logically **proving** to you that what you're saying requires an impossibility, and you're telling me I'm wrong because it "doesn't matter"? — user541686, Oct 09 '16 at 11:05
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/125275/discussion-between-david-schwartz-and-mehrdad). — David Schwartz, Oct 09 '16 at 11:06
@Mehrdad: Perhaps a better means of saying things would be to say that UB cannot time travel back past the last point where something could have happened in the real world that would have made behavior defined. If an implementation could determine by examining input buffers that there was no way any of the next 1000 calls to getchar() could block, and it could also determine that UB would occur after the 1000th call, it would not be required to perform any of the calls. If, however, an implementation were to specify that execution will not pass a getchar() until all preceding output had... — supercat, Oct 24 '16 at 19:43
...been delivered to a 300-baud terminal, and that any control-C which is occurs before that will cause getchar() to raise a signal even if there were other characters in the buffer preceding it, then such an implementation could not move any UB past the last output preceding a getchar(). What's tricky is knowing in what case a compiler should be expected to pass through the the programmer any behavioral guarantees a library implementation might offer beyond those mandated by the Standard. — supercat, Oct 24 '16 at 19:47

At what point in the loop does integer overflow become undefined behavior?

12 Answers12

The top answer is a wrong (but common) misconception:

Undefined behavior is a run-time property*. It CANNOT "time-travel"!

So yes, once UB is executed, any effects of previous operations become undefined. But until that happens, the execution of the program is well-defined.

Linked

Related