8

One example of this article from a msdn blog made me ticker:

It says that this function:

void unwitting(bool door_is_open)
{
    if (door_is_open) {
        walk_on_in();
    } else {
        ring_bell();

        // wait for the door to open using the fallback value
        fallback = value_or_fallback(nullptr);
        wait_for_door_to_open(fallback);
    }
}

Can be optimized into this one:

void unwitting(bool door_is_open)
{
    walk_on_in();
}

Because calling value_or_fallback(nullptr) is undefined behavior (this is proven earlier in the article).

Now what I don’t understand is this: the run time enters undefined behavior only when it reaches that line. Shouldn’t the happen-before / happen-after concept applies here, in the sense that all observable effects of the first paragraph have be resolved before the run time enters UB?

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
qdii
  • 12,505
  • 10
  • 59
  • 116
  • 5
    LOL I was used to UB orders pizza or send resignation letter to Boss, this _time travel_ sounds cool :D – P0W Jul 02 '14 at 09:25
  • 11
    Time travel is a perfectly valid form of "undefined behaviour". – Kerrek SB Jul 02 '14 at 09:25
  • 2
    The answer to your question is the blog post you are quoting from: if undefined behavior happens at any point along an execution, the entire execution is undefined, and “anything can happen” applies to the entire execution. The other definition would prevent reordering of most instructions across “observable effects”, since most instructions can have undefined behavior. – Pascal Cuoq Jul 02 '14 at 09:32
  • @P0W wouldn't it be great if next version of gcc, if you dereference null pointer than 50% it sends resignation letter, and 50% it orders pizza – M.M Jul 02 '14 at 09:50
  • 1
    The standard explicitly allows time travel under UB. Compilers use this permission to exploit optimisation opportunities that would be unavailable otherwise. – n. m. could be an AI Jul 02 '14 at 09:50
  • 3
    Relevant passage from the standard : 1.9/5 "if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (**not even with regard to operations preceding the first undefined operation**)." – n. m. could be an AI Jul 02 '14 at 12:18
  • @n.m.: I wonder how many of the people approved that language expected to negate the laws of causality as well as time? There is usefulness to allowing a compiler to the hoist loop-invariant code without having to prove it won't cause an arithmetic overflow, even though that could seem to cause "time-travel", but some kinds of inferences used in dead-code removal seem like they'd only be legitimate in a non-causal non-sequential universe. In a causal, sequential, universe I would posit that an authorization to assume X should imply that any action which would be reasonable if X were true... – supercat May 08 '15 at 22:26
  • ...should be deemed reasonable, *but only for those actions whose reasonableness "when X is true" wouldn't depend upon the falsehood of X*. In a non-causal universe, if X is false, then for any Q the statement "X implies Q" will be vacuously true, but I don't think that such logic would be consistent with causality. – supercat May 08 '15 at 22:33
  • 1
    I'm not sure what laws of causality have to do with any of this. No actual physical time travel is permitted by the standard, it's a metaphor. – n. m. could be an AI May 09 '15 at 19:35

3 Answers3

14

There is a flow in the reasoning.

When a compiler writer says: we use Undefined Behavior to optimize a program, there are two different interpretations:

  • most people hear: we identify Undefined Behavior and decide we can do whatever we want (*)
  • the compiler writer meant: we assume Undefined Behavior does not occur

Thus, in your case:

  • dereferencing a nullptr is Undefined Behavior
  • thus executing value_or_fallback(nullptr) is Undefined Behavior
  • thus executing the else branch is Undefined Behavior
  • thus door_is_open being false is Undefined Behavior

And since Undefined Behavior does not occur (the programmer swears she will follow the terms of use), door_is_open is necessarily true and the compiler can elide the else branch.

(*) I am slightly annoyed that Raymond Chen actually formulated it this way...

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • When standard writers say "undefined behavior" to the language users, they also say: "it will not break your code base, you will use the same stuff, it just says that on one platform it may be this, and on another it may be that". Then the compiler writers come and say: "we don't care about the tons of code you wrote, we have the standard and our own tons of code to maintain"... – 18446744073709551615 Jul 02 '14 at 11:59
  • 2
    @18446744073709551615 "on one platform it may be this, and on another it may be that". They most definitely do NOT say that. You are welcome to try and find a quote. – n. m. could be an AI Jul 02 '14 at 12:12
  • @n.m. That would be the definition of unspecified behavior, right? – qdii Jul 02 '14 at 14:15
  • 2
    @qdii:unspecified behavior or implementation-defined behavior yes. Undefined Behavior is simply forbidden. – Matthieu M. Jul 02 '14 at 14:21
  • @MatthieuM.: The idea that Undefined Behavior was ever meant to be "forbidden" is bizarre historical revisionism. C was designed as a *systems programming language*, which means that much of its *reason for existence* lay in the ability of implementations to let programs manipulate low-level aspects of the underlying platforms in ways which were defined by the documentation of those platforms and implementations, rather than the documentation of the language as a whole. The things which invoke "Undefined Behavior" are mostly things where some plausible implementations might trigger... – supercat May 08 '15 at 22:46
  • ...traps which might conceivably be useful but whose behavior would be outside the jurisdiction of the standard, while implementations might do other useful things. The C standard does not specify what will happen if code uses relational operators on unrelated pointers (it's Undefined Behavior) but an efficient implementation of Unix (the purpose for which C was created) would have been *impossible* under the constraints imposed by modern compilers. – supercat May 08 '15 at 23:12
  • @supercat: What you are describing sounds more like unspecified behavior (it may happen, the Standard does not say what occurs and the implementation may not) or implementation defined behavior (it may happen, the implementation documentation should define what occurs) – Matthieu M. May 09 '15 at 11:59
  • @MatthieuM.: Unspecified Behavior requires a selection from among alternatives *listed in the standard*. Implementation-Defined behavior requires that the implementation specify a *specific* outcome. From a purely-requirements-driven standpoint, saying that an implementation must list things that can happen, but that it could list Undefined Behavior among them [e.g. because a divide-by-zero error that happened simultaneous with a peripheral interrupt might jump to a random address] is semantically equivalent to saying that the Standard imposes no requirements beyond what an implementation... – supercat May 09 '15 at 15:31
  • ...specifies for itself. One of the reasons that C caught on as a language was that it was able to do many things that other languages couldn't do without externally-linked code written in assembly language. On a typical MS-DOS machine using almost any MS-DOS compiler, `memcpy((char far*)0xB8000000,"R\4A\6I\16N\2B\3O\1W\5",14);` will show the word "RAINBOW" colorfully at the upper-left of the screen. Unless a pointer-to-number cast has previously yielded the value 0xB8000000 the program would certainly engage in Undefined Behavior, but that doesn't mean it wasn't useful. – supercat May 09 '15 at 15:38
  • @MatthieuM.: C was written as a language to allow operating systems to be written using a minimal amount of assembly code. The idea was that the portions of an OS that did not rely on machine-specific quirks would be portable, and those that did rely on machine-specific quirks wouldn't be portable to machines without those quirks *but could be written in C anyhow*. The C standard says that relational operators between unrelated pointers is UB because while most platforms can compare unrelated pointers just as cheaply as pointers within an object, there are some platforms were comparing... – supercat May 09 '15 at 15:49
  • ...arbitrary unrelated pointers so as to form a ranking would be much more expensive than comparing related pointers, and some programmers might prefer to have unrelated-pointer comparisons trap rather than yield inconsistent results (and since trapping is outside the scope of the C standard, that's UB). Someone who was trying to write an OS would need to use an implementation that could rank arbitrary pointers, but someone who didn't need that feature could choose an implementation which could perform pointer comparisons faster. – supercat May 09 '15 at 15:54
  • @MatthieuM.: What I would like to see as the next evolution of C would be for the standard to provide "machine-readable" ways for programs to express their requirements, and for compilers to have more machine-readable ways of expressing their behaviors. Many compilers support a wide range of configuration options via command-line switches, but there's no standard. I would like to see standardized directives that would mean things like "either enable an option for two's-complement wrapping `int` arithmetic or refuse compilation". In many cases, programmers will be able to accept... – supercat May 09 '15 at 16:08
  • ...a wide variety of alternative behaviors but not Undefined Behavior. For example, if a programmer could say that in some part of the code it would be okay for the result of adding 1 to a variable holding MAX_INT to behave as any value congruent to (MAX_INT+1) mod (UINT_MAX+1), such a constraint would allow the compiler to generate more much more efficient code than if the programmer had to guard against overflow or the compiler had to force wrap-around. I'd also like to see explicit directives for programmers to help compilers' inferences and would expect such directives could... – supercat May 09 '15 at 16:18
  • ...reduce compile times while improving code quality and safety. A tool that could analyze a program for 20 minutes and identify places where adding inference directives could improve things would be more helpful than a compiler which would spend 60 seconds on every build trying to find inference opportunities, since most programs get built more than 20 times, and programmers who were told "Handling for a certain corner case is makes this method slower than it would need to be" could determine whether that corner case mattered and add a directive to either say "omit it" or "yes it's needed". – supercat May 09 '15 at 16:21
  • @supercat: wow, that's a blog post! More seriously, I agree that it's been common for systems developers to rely on specific behaviors of their compilers (gcc's extensions for linux are noteworthy) and get annoyed when the same compiler would pull the carpet under their feet in a subsequent version (on the ground it's UB by the specs anyway); I can recall reading several of Linus' rants on this point. I personally find that this is a strength only to a certain point, and it might have been necessary but today it seems to be a liability (security vulnerabilities...). – Matthieu M. May 09 '15 at 16:42
  • In this sense, I prefer how the Rust language seems to approach the question: avoid undefined behavior, completely, and add specific functionality (or reach out to other languages) as necessary. For example, last's year discussion on underflow/overflow which resulted in them yielding an unspecified result (unchecked in Release mode as it's too costly, and panicking in Debug mode). – Matthieu M. May 09 '15 at 16:45
  • @MatthieuM.: I'd like to see a C-derived language include distinct types for integers which are guaranteed to wrap, integers which are guaranteed to either yield a correct result or trap, and integers whose behavior is controlled by context and build settings (likely "do whatever" in most cases, but with the ability to enable traps when tracking down problems). Values where overflow would be a security issue could be declared as "always trap", and things like hash-code computations which need wrapping integer semantics could specify "always wrap"; always-wrap types would be non-promotable. – supercat May 09 '15 at 16:56
  • @MatthieuM.: A lot of code today assumes that the product of any two `uint32_t` values will yield a `uint32_t` value, but such an assumption would cease to hold if `int` were 64 bits. Since processors would be able to handle 64-bit math more efficiently if they did not have to handle 32-bit math as efficiently as 64-bit math, `int` really should move toward being 64 bits, but fixing all the code broken by that would be hard. If distinct types `uwrap32`, `utnum32`, and `unum32` [wrapping, overflow-trapped, and default-numeric-behavior] existed, however, most uses of `uint32_t` could easily... – supercat May 09 '15 at 17:02
  • ...be replaced by one of those, and it would generally be obvious which one was required. Care to chat? – supercat May 09 '15 at 17:02
  • Unfortunately, being in a different timezone, it's a bit late here; I will just point out that Rust's current crop of integers is close to what you wish for as the "regular" ones (`i32` and ilk) have unspecified behavior on overflow (`panic!()` in Debug) and have `wrapping_add` (and such) methods when wrapping is desired (though more easily used with a `wrapping!(a + b - 5)` macro which transforms the regular operators into wrapping function calls). – Matthieu M. May 09 '15 at 19:00
  • @MatthieuM.: That would be a good approach for a new language. I think having new types might make it easier to port existing code, though. In any case, my main point is that C used to be not a single language, but rather a name given to a huge variety of dialects that shared a common core. It might make sense to try to come up with a single unified language standard, since platforms are much less varied than they used to be and compilers don't have the same memory constraints, but any such standard should seek to include the common extensions, rather than paring them all out. – supercat May 10 '15 at 02:25
  • @supercat I know I'm super late with it, but as for your DOS memcpy example, it's not undefined behaviour, but implementation-defined behaviour, so it should be fine. – Karol S Nov 08 '17 at 10:03
  • @KarolS: An implementation *could* specify its integer-to-pointer mapping such that the code would work as described, but from the Standard's point of view, it could just as legitimately map integers to pointers in some other fashion. On an implementation that specifies the mapping in the common 8086 fashion and which specifies that accesses via such casts will be treated as `volatile`, behavior would be defined on systems with CGA cards, but on an implementation that specifies that it uses 16-bit silent-wraparound two's-complement integer semantics, the behavior of... – supercat Nov 08 '17 at 13:31
  • `unsigned int x=uchar1*255;` would be defined as yielding arithmetically-correct results for all unsigned 8-bit character values uchar1 instead of merely values 0-128 even though the Standard calls it UB. – supercat Nov 08 '17 at 13:40
2

It's true that undefined behaviour may happen only at runtime (e.g. dereferencing a pointer which happens to be null). Other times, a program may statically be "ill-formed, no diagnostic required" (e.g. if you add an explicit specialization for a template after it has already been used), which has the same effect, though: You cannot argue from within the language how your program will behave.

Compilers can use UB to "optimize" code generation aggressively. In your case, the compiler sees that the second branch will cause UB (I assume that this is known statically, even though you didn't spell it out), and so it can assume further that that branch is never taken, since that's indistinguishable: If you did enter the second branch, then the behaviour would be undefined, and that includes behaving like you entered the first branch. So the compiler can simply consider the entire code path that leads to UB as dead and remove it.

There's no way for you to prove that something is wrong.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 2
    I think UB can also cause the compiler to fail (e.g. if the compiler determines that executing the program invariably leads to UB, the compiler can refuse to generate an executable) – M.M Jul 02 '14 at 09:48
  • @MattMcNabb do you know if any compiler implements that? or if it can be turned on by adding a parameter? – qdii Jul 02 '14 at 09:59
  • No idea, sorry. I've seen that argument used to justify why a compiler fails to compile some particular weird program, however it could be a special case of a compiler bug – M.M Jul 02 '14 at 10:00
  • `There's no way for you to prove that something is wrong.` I'm not sure what you meant here. If you meant that there's no way to detect UB at runtime, then sure there is: use UBSan, a.k.a. `-fsanitize=undefined`, which instruments the executable to check for and report a whole pile of horrible UB cases (or, er, so I'm told by a different programmer, who wasn't me, and wasn't compiling and running my code). @M.M I believe that can happen when invoking detectable UB in `constexpr` functions whose result is required to be evaluated at compile-time due to being called from a `constexpr` expression – underscore_d Aug 12 '16 at 10:44
  • @underscore_d: I meant it in the sense of C++ itself: A program cannot, expressed in itself, detect that UB is happening. Of course the *external* environment can do whatever it likes and provide analysis tools. For instance, a human or Stack Overflow could tell you there's UB. – Kerrek SB Aug 12 '16 at 10:59
0

That's what happens when people try to translate common sense to a spec and then interpret the spec without the common sense. In my personal opinion this is completely wrong, but this is what is being done in the course of language standardization.

In my personal opinion, a compiler should not optimize code with undefined behavior. But the current post-modern compilers just optimize it out. And the standard permits both.

The logic behind the particular misbehavior that you mentioned is that the compiler operates on branches: if something is undefined in a branch, it marks the whole branch as having undefined behavior; and if a branch has undefined behavior, it may be replaced by anything.

The worst thing about this all is that new versions of the compiler may break (and do break) existing code -- either by not compiling it or compiling it to nonsense. And "existing code" typically is a really large amount of code.

18446744073709551615
  • 16,368
  • 4
  • 94
  • 127
  • On the other hand, it's a great way to get people to fix their code! – Kerrek SB Jul 02 '14 at 11:49
  • 1
    @KerrekSB _it's a great way to get people to fix their code_ -- an error message would be more appropriate. – 18446744073709551615 Jul 02 '14 at 12:01
  • 5
    "new versions of the compiler may break (and do break) existing code". They only break code that is already broken. The blame lies with people who write broken code, not with compilers or standard writers. – n. m. could be an AI Jul 02 '14 at 12:29
  • 1
    @18446744073709551615: That exists. It's Clang's UBSan. – Kerrek SB Jul 02 '14 at 12:45
  • @n.m. You see, most of the development time is spent in debugging. (A manager who is far from software could ask: _Debugging? Why on Earth do you write code that does not work at once?_) And now there are bugs that are not discovered while debugging. And even testing by the whole QA department will not help. – 18446744073709551615 Jul 04 '14 at 14:24
  • 1
    "now there are bugs that are not discovered while debugging". What do you mean by "now"? Do you want to say that until now all, or significantly more, bugs were discovered while debugging? What exactly does this optimization technique change, compared to other optimization techniques? – n. m. could be an AI Jul 04 '14 at 14:50
  • If code checks whether two pointers overlap by performing `p2 >= p1 && p2 < p1+p1size` rather than `int overlap=0,i; for (i=0; i – supercat May 09 '15 at 16:29
  • ...almost certainly be able to produce the necessary code from the first with a lot less compilation time/effort than a hyper-modern compiler would require to produce the same code from the second. If the C Standards Committee were to identify and fix the places where the Standard is deficient and programmers have to rely upon implementations to fill in the gaps, there would be less legitimate need to use behaviors not specified by the Standard, but if the Standard doesn't allow programmers to do what they need to do efficiently, programmers shouldn't be held hostage to it. – supercat May 09 '15 at 16:38
  • @KerrekSB and UBSan is now available in `g++` too! I didn't see this before suggesting it on your answer :) – underscore_d Aug 12 '16 at 11:24
  • @KerrekSB Yeah, I got that from your reply on the other answer. I have no idea why you're repeating it here too, since all did was to add to your citation of UBSan by pointing out that it is now available on a 2nd compiler. – underscore_d Aug 12 '16 at 17:06