105

The classic apocryphal example of "undefined behavior" is, of course, "nasal demons" — a physical impossibility, regardless of what the C and C++ standards permit.

Because the C and C++ communities tend to put such an emphasis on the unpredictability of undefined behavior and the idea that the compiler is allowed to cause the program to do literally anything when undefined behavior is encountered, I had assumed that the standard puts no restrictions whatsoever on the behavior of, well, undefined behavior.

But the relevant quote in the C++ standard seems to be:

[C++14: defns.undefined]: [..] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [..]

This actually specifies a small set of possible options:

  • Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons).
  • Behaving in a documented manner characteristic of the environment -- this actually sounds relatively benign. (I certainly haven't heard of any documented cases of nasal demons.)
  • Terminating translation or execution -- with a diagnostic, no less. Would that all UB would behave so nicely.

I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "time travel") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).

Am I misunderstanding the definition? Are these intended as mere examples of what could constitute undefined behavior, rather than a comprehensive list of options? Is the claim that "anything can happen" meant merely as an unexpected side-effect of ignoring the situation?

Two minor points of clarification:

  • I thought it was clear from the original question, and I think to most people it was, but I'll spell it out anyway: I do realize that "nasal demons" is tongue-in-cheek.
  • Please do not write an(other) answer explaining that UB allows for platform-specific compiler optimizations, unless you also explain how it allows for optimizations that implementation-defined behavior wouldn't allow.

This question was not intended as a forum for discussion about the (de)merits of undefined behavior, but that's sort of what it became. In any case, this thread about a hypothetical C-compiler with no undefined behavior may be of additional interest to those who think this is an important topic.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Kyle Strand
  • 15,941
  • 8
  • 72
  • 167
  • 3
    It really has to do with operating system differences. For example, is memory initialized to zero? Is there a stack guard active? Does it use addreess randomization? The spec is silent because different behaviors are possible. Including a grue. – Elliott Frisch Aug 21 '15 at 04:57
  • 15
    Undefined behavior is always a joke until [someone gets incinerated](https://www.youtube.com/watch?v=z3z1Xi_5m80) – Paul Aug 21 '15 at 05:11
  • 5
    Instead of "nasal demons", I like to say that undefined behaviour can call your ex. – Brian Bi Aug 21 '15 at 05:16
  • 1
    I occasionally blame the empty milk carton in the refrigerator on UB. – Jason Aug 21 '15 at 05:32
  • 8
    "Permissible undefined behavior ranges from ignoring the situation completely with **unpredictable results**" I think that pretty much covers everything under the sun. – juanchopanza Aug 21 '15 at 05:36
  • 1
    @juanchopanza As mentioned in the question, just because the results are unpredictable doesn't mean the compiler can do *anything*. – Kyle Strand Aug 21 '15 at 05:43
  • 2
    @KyleStrand In this case, it isn't what the compiler does, but what happens when the program runs. So, anything can happen that can happen with the computer the program is running on. – juanchopanza Aug 21 '15 at 05:50
  • @juanchopanza I understand that. But, for instance, an old version of GCC supposedly launched NetHack when (some) UB was encountered. That's *not* "ignoring the situation." – Kyle Strand Aug 21 '15 at 05:52
  • 9
    Just as a point of general English usage, if someone says *"Our travel agency offers holidays from Australia to Turkey to Canada"* - it doesn't mean those are the only countries available; there's no implication that the list is exhaustive. – Tony Delroy Aug 21 '15 at 05:53
  • @KyleStrand That is most likely ignoring the situation and then something unpredictable (as far as the C++ standard is concerned) happening. Unless the that behaviour is documented, which I doubt. – juanchopanza Aug 21 '15 at 05:56
  • 3
    @juanchopanza The bit I quoted says ignoring the situation *with* unpredictable results, not ignoring it *and then* doing something unpredictable. – Kyle Strand Aug 21 '15 at 06:11
  • Here is my favorite link regarding UB. It's specifically about race cases, but I love it because it shows the sorts of things a compiler might actually choose to do when facing UB. It's also really funny! https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong – Cort Ammon Aug 21 '15 at 06:13
  • 1
    Also, since launching NetHack was almost certainly an intentional Easter egg, it's by definition not an "unpredictable" result of UB. But it doesn't particularly matter, since the list is not exhaustive. – Kyle Strand Aug 21 '15 at 06:13
  • @TonyD I suppose. It seemed much more specific when I first read it. – Kyle Strand Aug 21 '15 at 06:15
  • @KyleStrand Maybe you should explain why you think "ignoring the situation with unpredictable results" is different to "ignoring it and then doing something unpredictable". – juanchopanza Aug 21 '15 at 06:18
  • 2
    @juanchopanza The former implies that any "unpredictable" results are strictly caused by the unpredictable state of the system when the situation occurs and the program's non-action of ignoring the situation. The latter implies that the program actually performs some sort of *extra* action *because of* the original situation. – Kyle Strand Aug 21 '15 at 06:21
  • I don't think the former implies any such thing. – juanchopanza Aug 21 '15 at 06:26
  • Always keep a can of RAID handy by your computer, just in case.... – Martin James Aug 21 '15 at 09:16
  • 1
    UB can be [worse](http://stackoverflow.com/a/25636788/841108) than nasal daemons. – Basile Starynkevitch Aug 21 '15 at 13:34
  • Personally I prefer "make your cat pregnant" and "turn your mustache blue" over "nasal demons." – Ixrec Aug 21 '15 at 13:37
  • Or sending an email to the moon. – Peter Mortensen Aug 21 '15 at 14:28
  • Or [murdering your cat](http://stackoverflow.com/a/31815792/560648). – Lightness Races in Orbit Aug 21 '15 at 17:15
  • 2
    You're only referring to **undefined behavior at compile-time**, which is only part of the story. Consider also undefined behavior at runtime, where e.g. a stray memory access could write a memory-mapped address which is connected to circuitry enabling a landmine under your chair. – smci Aug 21 '15 at 19:12
  • 1
    @smci No, my question is about UB in general. – Kyle Strand Aug 21 '15 at 19:14
  • ...well when you ask "permit *anything* to happen", compile-time is less than half the story. The real side-effects happen at runtime, when the executable is running on a particular OS(/VM), as a process, on a particular architecture (e.g. x86), presumably with other processes and data also... the standard doesn't speak to those possibilities. – smci Aug 21 '15 at 19:45
  • 1
    @smci .....? The question is *still* about UB in general. There's nothing in it to indicate otherwise. – Kyle Strand Aug 21 '15 at 19:48
  • 1
    ...but then it makes no more sense to reference the C/C++ standard than a Windows or Unix manual, or a CPU spec, or some article on undefined or malicious behavior, or a spec for the other processes which might be running (e.g. browser) and how to exploit them. – smci Aug 21 '15 at 19:52
  • 1
    @smci I don't understand your objection. The standard defines what code the compiler is allowed to generate; the generated code controls the range of runtime possibilities, excluding extraordinary mechanical circumstances such as a faulty chip. My question is about what the compiler is allowed to do in cases defined by the standard as UB--regardless of whether the *effects* of that decision are seen at compile-time or run-time. – Kyle Strand Aug 21 '15 at 19:56
  • ...and you can't know whether "anything" can happen without knowing the runtime behavior of said code (what if you emit bad/undefined/malicious code but the OS/VM/CPU catches it? Well did "anything" actually happen or not?). Or are you asking *"Is it possible to make the compiler emit any specific desired object code we might want to emit?"* Basically your phrase "permit *anything* to happen" is way too vague as to what is "happening" and in which phase. – smci Aug 21 '15 at 20:13
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87616/discussion-between-kyle-strand-and-smci). – Kyle Strand Aug 21 '15 at 20:13
  • 1
    If your CPAP machine has a software bug, nasal demons wouldn't be out of the question... – Mark Ransom Aug 22 '15 at 00:52
  • Doesn't the ***as if*** rule allow the compiler to insert whatever code it wants so long as it preserves the observable behavior? And if the code results in undefined behavior, even under the *ignoring the situation* still give the compiler permission to do whatever it wants? – jxh Aug 22 '15 at 02:04
  • 1
    @jxh: The Standard does allow that. I, and many other people, would consider the Standard defective in failing to describe normative behaviors for cases where implementations are able to offer stronger guarantees than the Standard would require, at cost far below that necessary for application to deal with the lack of such guarantees. – supercat Aug 22 '15 at 04:14
  • @jxh If (as I had thought) the standard did indeed require compilers to pick one of these options for each instance of UB, then I'm not sure the "as if" rule would make much difference; launching NetHack, for instance, is perfectly legitimate UB as it's actually defined, but that couldn't be defended as a case of the compiler behaving "as if" it had simply ignored the situation. This is sort of irrelevant, though, since I misunderstood the standard. – Kyle Strand Aug 22 '15 at 04:41
  • @MarkRansom Really? Define "demon". (That was a joke, right?) – Kyle Strand Aug 22 '15 at 04:43
  • @MarkRansom may have been implying something coming out of your nose as it kills you. – jxh Aug 22 '15 at 05:14
  • Consider the hypothetical machine where NULL pointer dereference resulted in loading a ROM that happened to be programmed with NetHack. Then, a compiler emulates the behavior for when the code is ported to a different platform. – jxh Aug 22 '15 at 05:44
  • Now someone just needs to write a compiler that triggers a kernel panic on undefined behavior. – alexia Aug 22 '15 at 17:53
  • @nyuszika7h: That would be benign compared with the kinds of optimizers some compilers writers favor. On a Harvard-architecture machine with runs code from ROM, one would think the worst imaginable consequence of Undefined Behavior would be to overwrite all RAM and registers with the most vexatious combination of values possible, but hyper-modern UB goes beyond that. Given `if (should_launch_missiles()) { arm_missiles(); if (should_really_launch_missiles()) launch_missiles();} disarm_missiles();`, a compiler which can determine that UB will occur following that code, and that... – supercat Aug 22 '15 at 19:39
  • ...`disarm_missiles()` will always return without exiting or hanging, could replace it with `should_arm_missiles(); arm_missiles(); should_really_launch_missiles(); launch_missiles();`, on the basis that if either test returns false behavior will be undefined. If the UB after the original code were to randomize RAM and registers, it might cause execution to go directly to `launch_missiles()`, but an external hardware interlock would prevent launch when not armed. If execution had gone to `arm_missiles()`, the `should_really_launch_missiles()` check would likewise have prevented launch. – supercat Aug 22 '15 at 19:45
  • https://en.cppreference.com/w/cpp/language/ub – Jesper Juhl Mar 29 '23 at 12:50

9 Answers9

84

Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:

Undefined behavior: behavior for which this International Standard imposes no requirements.


Frequent point of confusion:

You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!

The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.

1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 3
    It's interesting, however, to compare the normative examples which presumably reflected the intended meaning of the phrase, with the behaviors of modern compilers. I've seen no evidence whatsoever that the authors of the Standard intended that compilers would use Undefined Behavior to determine what inputs a program would or would not receive. – supercat Aug 21 '15 at 06:31
  • 17
    @supercat Examples and notes are not normative. – T.C. Aug 21 '15 at 06:41
  • 7
    @supercat: It was quite obvious that the intent was essentially to "determine what inputs a program would not receive" - it's just that compilers were not so advanced at the time. For example, the whole point of `x< – R.. GitHub STOP HELPING ICE Aug 21 '15 at 14:51
  • To show that everything could happen, [here](https://stackoverflow.com/questions/29799574/disallowed-system-call-sys-socketcall-when-i-try-to-solve-the-sum-the-larges/29799798#29799798) UB triggers a forbidden system call on the host machine that would create a socket. – edmz Aug 21 '15 at 17:50
  • 2
    @R..: I would interpret the intention of UB with "<<" as "Programmers won't use `x< – supercat Aug 21 '15 at 18:39
  • ...was never intended to say that code which is written for platforms that define a non-overlapping ranking to all pointers should refrain from using expressions like `p >= base && p < base+length` to determine whether `p` identifies part of an object. On platforms which do not define a natural non-overlapping ranking for pointers, it's unclear what `p – supercat Aug 21 '15 at 18:43
  • 2
    @supercat except if we know `base` is the start of an object that is `length` long, it would be more efficient to replace `p >= base && p < base+length` with `true` than actually fiddle with bits and stuff, and it would be conforming. This is infinitely faster than actually doing the work. An infinite times slowdown sure seems impractical! – Yakk - Adam Nevraumont Aug 21 '15 at 19:14
  • @Yakk: Are you implying that K&R intended any anyone needing to determine whether `p` identifies an address within the indicated object should use the portable `for (i<0; i=base && p – supercat Aug 21 '15 at 19:34
  • ...shouldn't preclude the use of the algorithms on machines that could run them. – supercat Aug 21 '15 at 19:34
  • 4
    @supercat No, I'm saying that your use of "practical" is impractically vague. Sure, you'll know it when you see it. And compilers today are *free to state that their pointers exist in a flat memory space*. Some compilers choose not to make (many) guarantees beyond the standard, and exploit that freedom. Others compilers don't. Practical programmers either have to restrict their code to one version of one compiler using one standard, or code against the standard. Try to only dip into undefined behavior with lots of warnings and if the payoff is great, ideally asserting compiler versions. – Yakk - Adam Nevraumont Aug 21 '15 at 19:38
  • @Yakk: Until recently, compiler writers recognized that the efficiencies that would be made possible by constraining the effects of various forms of UB-invoking actions greatly outweighed the benefits that could be obtained by making the effects of such actions unpredictable. Most programs are subject to the requirement that when fed invalid input they may produce a wide range of possible outputs, but must not launch nuclear missiles. Allowing overflow to have a constrained range of effects would allow a compiler to make useful overflow-related optimizations... – supercat Aug 21 '15 at 20:07
  • ...while allowing source text to be focused on a program's main purpose. Allowing the compiler to do anything it likes with overflow will force a programmer subject to the above constraints to write code which avoids all possible overflows, doesn't allow any overflow-related optimizations, and is virtually guaranteed to be sub-optimal on at least some platforms. Consider `int mulcomp(int a, int b, int c, int d) {return a*b > c*d;}`, but with an added constraint that the function must not launch nuclear missiles. The function as written would meet constraints optimally on both... – supercat Aug 21 '15 at 20:12
  • ...the majority of 16-bit compilers for the 8086, and on the TMS 32C050 DSP (with 16-bit `int`, but a 16x16+32->32 multiply-accumulate unit). Doing the multiplies on unsigned values and then converting to `int` would be much slower than computing `(long)a*b-(long)c*d > 0`, but on the 8086 performing the comparison as `long` would require the compiler to spend extra code saving the high word from the first multiply and then comparing it to the high word of the second. I would suggest that efficiency losses from such coding are apt to dwarf any gains that couldn't be achieved by other means. – supercat Aug 21 '15 at 20:18
  • 2
    @supercat: You're neglecting the biggest benefit of making the effects of UB undesirable: breaking non-portable code and forcing people to write portable code. The value of this cannot be understated. – R.. GitHub STOP HELPING ICE Aug 21 '15 at 21:49
  • @R..: For many kinds of applications, if is possible to write readable code which when run through a variety of compilers for a variety of platforms will perform efficiently and correctly, but it is not possible to write strictly-compliant code which will be anywhere near as readable or efficient (or in some cases, meet requirements at all). In what way would portability be better served by requiring programs to use an anemic language, versus standardizing some small but important guarantees which many compilers have offered that make programming much more practical? – supercat Aug 21 '15 at 22:56
  • 2
    @supercat: I don't buy the argument that it's not possible to write portable code. It's certainly practical to write code without UB and that only depends on implementation-defined behavior in ways that are testable at compile-time, e.g. testing for `CHAR_BIT==8`, `type_MIN!=-type_MAX` (2s complement/full-range), `defined(UINTn_MAX)`, etc. This is a much less precarious situation to be in than depending on undefined behavior to be endowed with untestable properties you expect it to have on real-world systems. – R.. GitHub STOP HELPING ICE Aug 21 '15 at 23:51
  • @R..: Why not **make the necessary properties testable**? Let code specify what it needs, and not bother trying to support the platforms it's never going to run on anyway. Or, better yet, define compiler intrinsics to do things which are awkward in C but frequently easy in machine code (e.g. store a 32-bit word as four `char` values MSB-first). Even on x86, making an intrinsic for that yield optimal code (a byte-swap instruction followed by a word store) would be easier than having a compiler recognize all of the ways that programmers might try to code such a thing. – supercat Aug 22 '15 at 01:32
  • 2
    @supercat: Now you're getting off into a separate topic which is feature-creeping the language. There are as many feature requests for C as there are users, but if even 1% of them were entertained, it would be as big a mess as C++.. :-) In any case that's a separate topic from UB. – R.. GitHub STOP HELPING ICE Aug 22 '15 at 05:03
  • 1
    @supercat "a platform whose behavior for such an action _they believe_ meets their requirements. " – Random832 Aug 22 '15 at 07:42
  • @R..: If one examines the corpus of *existing* C programs, one will find certain target platform behaviors which are useful, widely supported, and sometimes relied upon. Rather than declaring programs using such behaviors illegitimate, it would be far more useful to catalog such behaviors and have standardized means by which programs could indicate which behaviors they require. This would improve robustness, portability, and efficiency. It's not necessary to catalog every behavior that any compiler has ever supported, since the biggest benefits would come from identifying those that are... – supercat Aug 22 '15 at 17:23
  • ...most widely supported. On the other hand, since the only burden on a compiler writer from adding a precisely-delineated behavior to the catalog would generally be adding extra line to a header file to identify the behavior, and having the compiler indicate whether it's supported or not, there shouldn't be much of a "burden of proof" in favor of cataloging any particular behavior. If many programmers end up indicating a behavioral requirement which a compiler can meet with optimizations off, but can't meet with optimizations on, and if slightly constraining the optimizer's behavior... – supercat Aug 22 '15 at 17:36
  • ...would allow the compiler to meet that requirement, then the authors of that compiler would be able to see that they could improve the performance of a lot of code by offering an intermediate setting, rather than trying to guess at what optimizations would and would not be useful. – supercat Aug 22 '15 at 17:37
  • @Random832: True, but that's still nowhere near the claim that programmers assert that UB will not occur. Further, the way compilers treat the word "assume" seems rather different from the normal usage, and doesn't quite fit a causal-sequential universe. In a non-sequential universe of mathematical propositions, if the only way for P too be true is for Q to be true, then an invitation to assume P is an invitation to assume Q, and Q's falsehood would imply that an invitation to assume P would be an invitation to assume anything and everything. I don't think that logic works in a... – supercat Aug 22 '15 at 17:42
  • ...causal-sequential universe, though. If someone is planning to pick up an a parcel from an office, an invitation to assume the shipping clerk will have it ready would not imply permission to go on a murderous rampage if he doesn't. Further, even if the only way the shipping clerk could have the package ready would be if some other task had been performed, an invitation to assume the package will be ready would not imply permission to take actions which required the other task to have been performed. Instead, the assumption that the package would be ready would allow... – supercat Aug 22 '15 at 17:56
  • ...the courier to be dispatched to pick up the package in advance of its readiness, and a willingness to accept the cost of an wasted visit by the courier should the package not be ready. In everyday life, the ability to grant license to make assumptions *for constrained purpose* is very useful; I would posit that the ability would be just as useful in programming. – supercat Aug 22 '15 at 17:59
  • As a programming analogy to the latter case, consider a case where a program will be performing many small fixed-size memory-copy operations and about 0.1% of them would involve the same source and destination. If objects will never lap outside of that situation, but a compiler wouldn't be able to know that, it's entirely possible that neither `if (dest != src) memcpy(dest, src, 8);`or `memmove(dest, src, 8);` could be optimized to be as fast as would a version of `memcpy()` which was required to behave like `memmove` when source and destination pointers are equal [not quite a `nop`, since... – supercat Aug 22 '15 at 18:04
  • ...it would be legal to write an object using one type of pointer, `memmove` it to itself, and then read it using a different type of pointer]. – supercat Aug 22 '15 at 18:05
  • Another frequent point of confusion/contention is that it was historically very common for certain kinds of implementations to share certain behavioral traits in places where the Standard imposes no requirements. Even *the authors of the Standard* have indicated that the majority of current implementations used two's-complement silent-wraparound semantics on integer overflow and would would process signed and unsigned math identically, regardless of overflow, except in certain specific constructs. Were the authors of the Standard confused, or has "modern C" diverged from their intention? – supercat Mar 19 '19 at 17:28
24

One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given

int i=INT_MAX;
i++;
printf("%d",i);

some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.

Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...

#include <stdio.h>

int main(void)
{
  int ch = getchar();
  if (ch < 74)
    printf("Hey there!");
  else
    printf("%d",ch*ch*ch*ch*ch);
}

a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a consequence the program should print "Hey there!" unconditionally regardless of what character was typed.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
supercat
  • 77,689
  • 9
  • 166
  • 211
  • 3
    Wow. Any idea how we got from "potentially useful" to the current situation, in which much of the C++ community seems adamantly opposed to any attempt to determine the exact behavior of certain compilers upon encountering a situation allowing UB, with the explanation "it doesn't matter, your program has UB"? – Kyle Strand Aug 21 '15 at 15:00
  • 1
    It's all about the benchmarks – Bwmat Aug 21 '15 at 16:19
  • 12
    No, it's about portability. We live in an interconnected age now with software distributed faster than you can think. We're no longer writing programs for that one dusty supercomputer in the basement. At least, most of us aren't. It's effectively down to a decades-old paradigm shift in programming; there are now tangible practical benefits to coding rigorously to standards (which ideally we'd always have done), and the toolchain writers can take advantage of that to produce really fast and efficient compilers. Why not?! – Lightness Races in Orbit Aug 21 '15 at 17:05
  • 1
    @LightnessRacesinOrbit: Can you write an strictly-compliant function which behaves as `int mulcomp(int a, int b, int c, int d) { return a*b > c*d;}` when the values of `a*b` and `c*d` are representable as `int`, and is required to return 1, return 0, or terminate execution otherwise (arbitrary choice), without such a function being much harder to read than the original, and without the optimal code for the "portable" code being significantly slower than the code for the original on at least some platforms? – supercat Aug 21 '15 at 18:00
  • 5
    @LightnessRacesinOrbit: If the goal were to have a usable portable language, the Committee should recognize the existence of some distinct variations (e.g. dialects where `p >= object.base && p – supercat Aug 21 '15 at 18:07
  • 4
    ...and two distinct unsigned 32-bit integer types. On platforms where all values of `uint32_t` are representable as `int`, subtraction of two `uint32_t` values will yield a signed result. On platforms where some values of `uint32_t` are not representable as `int`, subtraction yields a `uint32_t` result. Both types are called `uint32_t`, but their semantics are extremely different. Likewise, on platforms where `int` is larger than 32 bits, incrementing an `int32_t` will always have defined behavior. On platforms where `int` is exactly 32 bits, incrementing `int32_t` can cause UB. – supercat Aug 21 '15 at 18:10
  • 2
    @LightnessRacesinOrbit: Further, a portable language should define an efficient portable means of packing and unpacking a larger integer type into/from a sequence of smaller ones. Writing `*dat++= value & 255; *dat++=(value >> 8) & 255; *dat++ = (value >> 16) & 255; *dat++ = (value >> 24) & 255;` may be 100% portable (even for machines where `CHAR_BITS > 8`, but even on platforms where a single 32-bit store would have yielded correct behavior it would be hard for a compiler to determine that. Given `__pack_i32_cle(&dat, value);` any compiler could easily produce optimal code. – supercat Aug 21 '15 at 18:17
  • 2
    I think it is difficult to justify the existence of UB; it's drawbacks are more than benefits – Giorgi Moniava Aug 21 '15 at 19:51
  • @Giorgi: There is significant benefit to having a category of actions where different platforms may be able to offer a variety of different behavioral guarantees which weren't necessarily compatible (e.g. some may guarantee that adding 1 to MAX_NT will trap, some may guarantee that it will yield MIN_INT, some may guarantee that it will yield some number, not necessarily within the range of `int`, which is congruent to (MAX_INT+1) mod (MAX_INT+MAX_INT+2, etc.) What's counter-productive is the philosophy that someone who is told "You may assume that someone will clean up after you"... – supercat Aug 21 '15 at 20:27
  • ...would be entitled to do absolutely anything whatsoever he pleases if it turns out that nobody in fact cleans up after him. – supercat Aug 21 '15 at 20:29
  • 2
    @Giorgi there are hundreds of languages that don't have UB , perhaps use one of those instead, instead of trying to make all languages the same – M.M Aug 22 '15 at 01:49
  • @MattMcNabb: There's a difference between saying that no actions should have undefined behavior, and saying that many of the actions which in C have totally-unconstrained behavior, shouldn't. Optimization opportunities are maximized when compilers offer behavioral guarantees which are as loose as possible *without increasing the amount of code programmers have to write to meet requirements*. Failure to offer such guarantees will compel programmers to write code which is more verbose, harder to read, slower, and less conducive to optimization than would otherwise have been possible. – supercat Aug 22 '15 at 18:15
  • 2
    @M.M But lots of people (such as myself) use C or C++ out of necessity; many more (in fact very likely anyone who uses a modern computer for any purpose) use programs written in C and C++, and I suspect that many of the mysterious bugs/errors/crashes/etc with which everyone is familiar are due to UB. – Kyle Strand Sep 18 '15 at 16:42
17

Nitpicking: You have not quoted a standard.

These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).

Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.

Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.

Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".

Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
user5250294
  • 179
  • 2
  • 1
    Re: nitpick: I was inspired to go find that quote in the draft-standard by seeing it quoted from the 2003 standard in another answer. The wording looked very similar, so I don't think the wording has changed much for at least a decade, which is why I felt comfortable quoting from the draft (plus, it's free and online). – Kyle Strand Aug 21 '15 at 05:27
  • 4
    The final versions of those standard are not freely available, but behind quite a high paywall, thus cannot be linked. However, the final drafts are identical with the final version in all relevant technical and linguistic aspects. Without those drafts, citations from and references to the standard are actually impossible. So what do you prefer: 1) citation from the final (and in that aspect identical) draft or 2) no citation at all, thus just stating with no foundation at all? (and how do you know there is **no** rubber band in space?) – too honest for this site Feb 05 '16 at 01:12
  • Note that the C Standard uses the term "shall" in a way which differs from the usage of the term in almost any other standard. In most standards, violation of a constraint would render an implementation non-conforming, but that's not true of the C Standard. A program that violates a constraint cannot be *strictly* conforming, but the Standard recognizes as "conforming", and is expressly intended not to demean, non-portable programs upon which it imposes no requirements, but whose behavior is usefully defined by some implementations. – supercat Mar 19 '21 at 21:16
14

The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.

Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.

If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • 6
    Historically, the fact that the Standard didn't define a behavior in no way implied that implementations shouldn't do so. Indeed, a number of things which trigger Undefined Behavior do so do so because prior to the ratification of the C Standard, different implementations made two (or more) contradictory guarantees, both of which were relied upon by programs written for those implementations. – supercat Aug 21 '15 at 06:17
  • @supercat: Thanks for this! As usual I greatly appreciate your historical insights. – Matthieu M. Aug 21 '15 at 06:48
  • Very true, supercat. There are several reasons behind something being undefined. One of those is that a number of compiler/library vendors - and their customers - did not want to lose particular features that predated the standard. The only way to get consensus was to make such features undefined (or implementation defined, unspecified, etc) and permit implementation freedom. – Peter Aug 21 '15 at 07:05
  • 1
    @Peter: The issue isn't just one of getting people to agree to a Standard. One of the reasons C has thrived is that compilers for various platforms could offer different trade-offs between performance, usability, and robustness, which were tailored to the needs of those platforms' users. – supercat Aug 21 '15 at 07:18
  • None of that would be possible in standard C, supercat, without specific provisions to permit implementation freedom. – Peter Aug 21 '15 at 07:33
  • 2
    A good example was dereferencing the null pointer. On SPARC reading that gave you the value 0, and writing silently discarded the result. On MS-DOS, that location held the interrupt table. Try reconciling _that_. – MSalters Aug 21 '15 at 08:14
  • 3
    @supercat But I believe the standard separately defines "implementation defined" behaviour, which DOES match with what you said. For example, what >> does on signed values is implementation defined (which means something consistent and defined in compiler documentation must happen), whereas what << does on signed values is undefined (which means anything can happen and nobody has to define it). Don't blame compiler writers; it's clear that modern writers of the standard are perfectly fine with what is going on, else they'd just make all the currently undefined behaviour implementation defined! – Muzer Aug 21 '15 at 13:47
  • @Muzer: In order for the Standard to state that an action invokes Implementation-Defined behavior, it must be practical for *every* implementation to make that action behave consistently. If overflow invoked Implementation-Defined behavior, for example, a platform where ADD instructions would trap on overflow but INC instructions would not would not be allowed to use INC instructions on signed types unless it added its own overflow checking (thus likely negating the purpose of using those instructions in the first place) or else documented the precise circumstances where it would use each... – supercat Aug 21 '15 at 18:54
  • 1
    ...instruction (which would likely be impractical, given that such issues may be affected by register allocation, which may be in turn affected by many other factors). I would suggest that there are places where the Standard expressly forbids programs from doing certain things (generally at the syntactic or structural level), and that if the Standard intended to forbid certain things it could have done so. – supercat Aug 21 '15 at 18:58
8

I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.

"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."

A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:

if(!spawn_of_satan)
    printf("Random debug value: %i\n", *x); // oops, null pointer deference
    nasal_angels();
else
    nasal_demons();

A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"

if(!spawn_of_satan)
    nasal_demons();
else
    nasal_demons();

"And from there, I can optimise to this:"

nasal_demons();

You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.

EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.

Muzer
  • 774
  • 4
  • 11
  • Err, sorry @supercat, I somehow missed the second half of your answer, which also explains this! – Muzer Aug 21 '15 at 09:58
  • 1
    ...yes, I realize that if the user *asks* for nasal demons in certain cases, then they might get summoned in unexpected cases if the program has UB. When I say that certain UB behaviors would require inserting code, I'm talking about completely unexpected behaviors that are not already explicitly written into your code. – Kyle Strand Aug 21 '15 at 14:48
  • There must be some corner case where it's weirdly more efficient to generate completely new code that takes advantage of UB. I'll dig out some of the articles I read later. – Muzer Aug 21 '15 at 15:05
  • I'd be interested to see that, but keep in mind the original question could be rephrased as "does the standard really allow arbitrary code insertion for UB", which has already been answered. – Kyle Strand Aug 21 '15 at 15:11
  • I.e., Mehrdad's answer shows that yes, insertion of code is permissible. – Kyle Strand Aug 21 '15 at 15:12
  • Indeed, and I'm not disputing that. I was just attempting to answer one of the questions you implied as a tangent to your main question! – Muzer Aug 21 '15 at 15:17
  • @Muzer: I suspect that someone developed a static analyzer which could determine how much of the code in a program would be necessary if it never had to handle any input which invoked Undefined Behavior, ran it on a bunch of popular software, and when it revealed that a large portion of the code of such programs could be eliminated, concluded that would be a useful form of optimization, completely ignoring the fact that in many cases the code was relying upon certain behavioral constraints which, while not required by the Standard, were nonetheless satisfied by pretty much every compiler. – supercat Aug 21 '15 at 19:17
  • 1
    @Muzer: The simple fact of the matter is that the set of behaviors defined by the C Standard is insufficient to perform many actions efficiently, but the vast majority of compilers have historically offered some extensions which allowed programs to meet their requirements much more efficiently than would otherwise be possible. For example, on some platforms, given `int a,b,c,d;` the implementation of `a*b>c*d` which would be most efficient when values are within range would compute `(int)((unsigned)a*b)>(int)((unsigned)c*d)`, while on other platforms the most efficient function would... – supercat Aug 21 '15 at 19:22
  • ...compute `(long)a*b > (long)c*d`. There are some platforms where casting to unsigned would be much faster than casting to long (even if they can optimize the int*int->long multiply), and there are others where casting to `long` would be faster (some DSPs, for example, have multiply-accumulate units that are longer than `int`). If a function that arbitrarily returns 0 or 1 in case of overflow would meet requirements, allowing `a*b > c*d` to represent such a function would allow optimal code on both kinds of platforms. Requiring a programmer to write one of the latter forms would prevent... – supercat Aug 21 '15 at 19:27
  • ...compilers for the dis-favored platform from generating optimal code, since they'd be compelled to precisely match corner-case behavior the programmer didn't care about. – supercat Aug 21 '15 at 19:28
  • You just saw why sane compilers have a way to suppress the NULL clobber though. When compiling in kernel mode code, for all the compiler knows, *NULL is a reasonable request. – Joshua Aug 21 '15 at 19:57
  • @Joshua: It would be very helpful if compiler writers could recognize a distinction between optimizing based on the assumptions that code won't do weird things when there's *no evidence* that it will (e.g. given `extern char *foo`, treating `*foo++ = char1; *foo++ = char2;`, as `foo[0]=char1; foo[1]=char2; foo+=2;`), versus trying to assume that code won't do things that evidence would suggest that it does (e.g. assuming that `void inc_float(float *f) { *(uint32_t*)f += 1; }` won't affect anything of type `float`). Some compiler writers claim such things would require compilers... – supercat Oct 28 '17 at 19:38
  • ...to be magically omniscient, but the question "does any evidence of weirdness exist" is pretty straightforward. In cases where blocking optimizations in the presence of weirdness would create a real performance problem, it may make sense to work out ways of allowing them when the evidence of weirdness is illusory, but I don't think that would be necessary very often. – supercat Oct 28 '17 at 19:41
8

First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.

To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation". A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...

Then, it is not always possible to know if a program has UB or not. Example:

int * ptr = calculateAddress();
int i = *ptr;

Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:

  • assume ptr will always have a valid address
  • insert runtime checks to guarantee a certain behaviour

The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.

The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.


Why is the behaviour not implementation-defined, but undefined?

Implementation-defined means (N4296, 1.9§2):

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int) ). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects. Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).

Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.

Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)

alain
  • 11,939
  • 2
  • 31
  • 51
  • Skipping the checks is the same as "ignoring the situation." This could still be a valid optimization with "implementation-defined" behavior, not UB. Also, I understand the halting problem, but see Rust for an example of a low-level language that solved the problem by disallowing null pointers. – Kyle Strand Aug 21 '15 at 14:53
  • It's not only null-pointers, signed overflow or division by zero are other examples of things that are generally impossible to forsee at compile-time. Sorry, I didn't understand what you mean with the first two sentences? – alain Aug 21 '15 at 15:04
  • Yes, I realize that Rust has not sidestepped the halting problem, but null pointer dereferencing is one of the most common types of errors, and it's the one you used as an example. My first two sentences are basically saying that your answer doesn't really address UB; yes, in C/C++ dereferencing a null is UB, but it could just as easily have been implementation-defined, which is different (and less permissive). – Kyle Strand Aug 21 '15 at 15:08
  • 1
    Yes, IIRC Stroustrup regrets having introduced null pointers. This is a great article that explains the advantages of UB: http://blog.regehr.org/archives/213 – alain Aug 21 '15 at 15:14
  • I'm not sure, but I think "implementation-defined" would not leave the freedom of completely ignoring the situation, which is what enables better performance. There are other invalid pointer values than null, which make checks quite complicated. – alain Aug 21 '15 at 15:16
  • Stroustrup didn't invent them, but yes, the inventor called them a million-dollar mistake. – Kyle Strand Aug 21 '15 at 15:22
  • :-) Yes "million dollar mistake" was the term, now I remember too. – alain Aug 21 '15 at 15:25
  • @KyleStrand: The primary "mistake" associated with null pointers was the failure of some compilers to trap on arithmetic operations which would turn a null pointer into a seemingly-valid pointer. If a compiler traps on any attempt to dereference a null-pointer, and on any pointer arithmetic which would attempt to turn a null pointer into something else, the only potential harmful consequence of null pointers would be to delay the point where a program terminates (on a system without a null-pointer concept, the act of reading an uninitialized pointer would yield an immediate trap, rather... – supercat Aug 21 '15 at 19:11
  • ...than allow for the possibility that the code might notice that the pointer is null before trying to do anything with it. – supercat Aug 21 '15 at 19:11
  • @supercat When you say "if a compiler traps on...", are you talking about detecting possibly-null pointers at compile-time, or about inserting dynamic checks, or something else? – Kyle Strand Aug 21 '15 at 19:14
  • @KyleStrand: Inserting code to check whether a pointer is null before performing arithmetic on it, hopefully keeping track of which pointers have been checked so as to minimize the added overhead. Note that if a `p` is non-null, a compiler can and should assume that a pointer formed by adding or subtracting an integer will likewise be non-null, so null checks can generally be hoisted out of loops. – supercat Aug 21 '15 at 19:42
  • @KyleStrand I tried to address *implementation-defined* vs. *undefined* in an edit. – alain Aug 21 '15 at 21:25
  • 2
    The behaviour of the compiler isn't undefined. The compiler is not supposed to format your harddrive, or launch missiles, or crash . What's undefined is the behaviour of an executable (if any) which the compiler produces. – M.M Aug 22 '15 at 01:45
  • 2
    "UB is not encountered at runtime, it is a property of the source code." - it comes in both varieties. UB may be encountered at run-time, for example dividing by an integer input by the user without checking that they didn't input `0` – M.M Aug 22 '15 at 01:46
  • @MattMcNabb I just asked this question here: http://stackoverflow.com/questions/32154832/is-the-behaviour-of-the-compiler-undefined-with-undefined-behaviour – alain Aug 22 '15 at 10:05
  • @MattMcNabb What I meant was: The compiler may treat `if(i != 0)` as `if(true)` after the division statement, so this UB has an effect at compile time. – alain Aug 22 '15 at 11:38
4

Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.

Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.

What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.

Waters
  • 343
  • 1
  • 11
  • 4
    That's not necessarily the case in C/C++. In many cases, undefined behaviour was deliberately foreseen, and deliberately left undefined. In C/C++, undefined behaviour is something defined in the spec and explicitly given for a few examples. I have no reason to believe that everyone working on the first standard just didn't think about what should happen when a NULL pointer is dereferenced. Instead, they probably deliberately left it undefined so that the compiler didn't have to special-case it, slowing down code. – Muzer Aug 21 '15 at 13:42
  • True, those fall under the "why would you do that?" category. Like if there was a wreck at the intersection but the light was green. You don't just drive. – Waters Aug 21 '15 at 13:51
  • See supercat's answer. Also, the range of possible things compilers are allowed to do when encountering UB is, frankly, insane; if the reason for UB were simply "we can't predict everything in advance," the only necessary directive would be "ignore the situation." – Kyle Strand Aug 21 '15 at 14:56
  • 2
    If a traffic light appears malfunctioning, treat like a stop sign. If code is malfunctioning, treat it cautiously, but continue on as able. – chux - Reinstate Monica Aug 21 '15 at 17:46
  • 1
    @Muzer: I think a bigger reason for UB is to allow for the possibility of code taking advantage of platform features which would be useful in some situations but bothersome in others. On some machines, overflow-trapped integer arithmetic is the normal behavior and non-trapped arithmetic is expensive. On other machines, integer arithmetic that overflows generally wraps, and overflow trapping would be very expensive. For the Standard to mandate either trapping or non-trapping behavior would not only increase the cost of all arithmetic on one platform or the other, but to add insult... – supercat Aug 21 '15 at 19:49
  • 1
    ...to injury, code which wanted to compute `x+y` using the the disfavored behavior and was written for hardware implementing that behavior would have to add additional logic to achieve the required behavior, and all of the added logic would run extra-slowly because of the logic included in the compiler. Thus, something that should have translated as `add r1,r2,r3` would instead end up as some huge monstrosity which could quite plausibly be less than 10% fast as the optimal code that could have met requirements if overflow had been UB. – supercat Aug 21 '15 at 19:54
  • 1
    @supercat but the point of C has always been portability. If you have code that does different things on different platforms, therefore, except where that's really necessary and what you want (eg things like inline assembly), your code is broken. You should therefore be coding to AVOID these situations. So compilers being able to turn this behaviour into anything at all, and mercilessly taking advantage of such a situation, is, in my mind, perfectly valid. People should NEVER have EVER relied on ANY behaviour that's potentially different between compilers/architectures. – Muzer Aug 23 '15 at 19:00
  • @Muzer: There are two ways a language can be portable: (1) The language requires platforms to implement consistent behaviors independent of the platform upon which it is running; Java, for example, requires that `int` be 32 bits, `long` 64 bits, and that integer shifts be performed mod 32 and long shifts mod 64; (2) Features of the language (e.g. the size of `int`, overflow semantics, etc.) vary by platform, such that compilers for the language can avoid having to generate inefficient code to emulate behaviors which the hardware doesn't support well, and which programmers... – supercat Aug 23 '15 at 20:54
  • ...may not want anyway. C was designed for the second kind of portability. It was never intended for the first. Nowadays, the market is totally dominated by hardware which support two's-complement 8, 16, 32, and 64-bit types without padding, where all pointers are ranked, where integer arithmetic naturally supports partially-indeterminate-value semantics on overflow, etc. and most programs will never need to run on platforms without those characteristics. Further, although C was designed in an era in which most programs were never expected to receive maliciously-crafted inputs,... – supercat Aug 23 '15 at 20:59
  • ...the world in which today's programs run is very different environment. If the authors of C standards wish the language to remain useful, they should acknowledge these realities and define some normative standards for behavior so that programs whose two requirements are: *(1) Generate correct output given valid input; (2) Generate arbitrary output, within broad constraints, when given invalid input*, don't have to write extra code to constrain the compiler more tightly than they want or need to. If a programmer wants to compute `x+y` when it's representable as `int`, and... – supercat Aug 23 '15 at 21:13
  • ...would be equally happy with it yielding any value congruent to the mathematical integer value of x+y mod 4294967296, I would posit that the most natural and readable way to express that when compiling for hardware platform that can implement such semantics (are there any modern ones that can't?) would be to write it as `x+y`. Having overflow return partially-indeterinate values will facilitate optimizations which will be unavailable if the programmer has to write code that avoids overflows at all costs. – supercat Aug 23 '15 at 21:28
  • @supercat But C provides sizeof which does allow the first kind of portability. I'm not necessarily defending the existence of undefined behaviour, I just don't agree with your claim that it was originally designed to allow programmers to be able to actually use that behaviour in real code, just that it would mean something different on different platforms. – Muzer Aug 24 '15 at 09:17
  • @Muzer: Things like INT_MAX make it possible for programs to accommodate a range of platforms, but it's not generally possible without extreme awkwardness to write code which will run correctly on every possible standard-conforming compiler where it compiles, since most programs will rely upon behaviors for which no standard testing macros exist. For example, a lot of code assumes that given `int16_t x;` the expression `(int16_t)(uint16_t)x` will always equal `x`, and while that may be true of all production C compilers, I don't think anything in the Standard would forbid a compiler... – supercat Aug 24 '15 at 15:10
  • @supercat but that behaviour (unsigned to signed int) is implementation-defined, not undefined. And I would generally hesitate about writing code that makes such assumptions, even though they might be completely safe today, except maybe in an embedded environment where porting the code to anything else would make zero sense. – Muzer Aug 24 '15 at 15:19
  • ...from specifying that `(int16_t)32768` will normally yield -32768, but will yield 24601 if the program is launched with the of command-line argument of "EASTEREGG" and the value was not used in a place where the Standard required a constant expression. – supercat Aug 24 '15 at 15:19
  • @Muzer: embedded code is routinely ported between different microprocessors, and even crosscompiled onto x86. – Waters Aug 24 '15 at 15:21
  • @Waters By that I meant the sort of embedded code that inherently depends on the precise hardware configuration, as opposed to more general embedded code, ie I meant "in (an embedded environment where porting the code to anything else would make zero sense)" rather than "in an embedded environment (where porting the code to anything else would make zero sense)". I'm fully aware that plenty of embedded code is written to be portable. – Muzer Aug 24 '15 at 15:23
  • @Muzer: The decision of whether to leave something undefined or implementation-defined seems to have been predicated upon whether on some platforms or for some kinds of applications it might be useful to have the compiler generate code that would trap or raise a signal in a fashion *outside the Standard's jurisdiction*. From a requirements perspective, there's no semantic difference between "Division by zero may yield an indeterminate value or cause a trap whose behavior an implementation should, but need not, document" versus simply saying it yields "Undefined Behavior". – supercat Aug 24 '15 at 15:25
  • @Muzer: Further, the way the Standard is defined makes it excessively difficult for embedded code to avoid UB when being ported from one platform to another. For example, given `int16_t checksum; void updateChecksumMulti(uint16_t dat, uint16_t n) { checksum += dat*n; }` will never yield Undefined Behavior if run on an 8-bit or 16-bit machine, but may yield Undefined Behavior on a 32-bit machine. Wile writing the expression as `1u*dat*n`; would work, and hopefully an embedded compiler would be smart enough not to actually do the multiply, I'd suggest that makes the code less clear. – supercat Aug 24 '15 at 16:38
  • @supercat Err, I don't believe that should produce undefined behaviour only for a machine that uses 32-bit ints. On the contrary, I believe it produces undefined behaviour for a machine that uses 16 bits or 8 bits. The C standard defines that for values directly representable in a smaller type, the result is implementation defined, which is what will happen on 32-bit; on all platforms though, dat*n might cause a signed overflow of checksum which is undefined. The fix is obvious, make checksum unsigned. – Muzer Aug 25 '15 at 08:45
  • @Muzer: I meant `checksum` to be `uint16_t`, but I typoed, making behavior Implementation-defined in machines with 16 bits rather than fully defined, but it's still better than the situation with 32-bit `ints`, where it would invoke Undefined Behavior if the product of `dat` and `n` exceeds 2147483647. – supercat Aug 25 '15 at 13:20
  • @supercat OK, very good point, that is pretty horrible. I'd missed that as I was distracted by the bigger issue! I think the main issue here is that C's integer promotion rules are all kinds of messed up. Ugh. – Muzer Aug 25 '15 at 14:31
  • Although C99 integer types as defined had the advantage of being back-portable to C89 compilers, they suffer from extremely murky semantics. IMHO, the Standard should define `unumN_t` and `uwrapN_t` types, such that `unumN_t` would, if defined, behave essentially as `uintN_t `behaves on machines where `int` is larger than N bits, and `uwrapN_t` would behave essentially as `uintN_t` behaves on machines where `int` is smaller; machines with any `int` size would be allowed to define any `uwrapN_t` and `uintN_t` for any N as compiler intrinsics if they can achieve the proper behavior. – supercat Aug 25 '15 at 16:54
4

One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.

If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)

So suppose your code contains an admittedly contrived example like the one below:

int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
   f();
   bar = 1;
}
if (!foo) {
   g();
   bar = 1;
}
assert(1 == bar);

The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.

Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.

Ray
  • 1,706
  • 22
  • 30
  • Sigh. Did you read my edit, asking people not to post answers about optimization unless these answers clearly distinguish what makes UB better for optimization than "implementation-defined" behavior? Also, I was asking *what* the standard permits, not *why* it permits it, so this technically doesn't answer the question--although I do appreciate the defense of UB, since I am increasingly opposed to the idea of UB in general. – Kyle Strand Aug 21 '15 at 20:43
  • 3
    The ability to be inconsistent one of the big differences. sizeof(int) is implementation-defined, but it's not going to change from 4 to 8 halfway through the program. If it was undefined, it *could*. Implementation-defined things also tend to have additional restrictions: e.g. sizeof(int) * CHAR_BIT must be at least 16, whereas if it was undefined, it could be or do anything at all. – Ray Aug 21 '15 at 21:09
  • That sounds like a useful distinction to include in your answer. – Kyle Strand Aug 21 '15 at 21:12
  • ...ah, I see that you've done so. – Kyle Strand Aug 21 '15 at 21:12
  • You might also want to look at http://stackoverflow.com/a/2397995/5196093. That answer includes the standard's definitions of undefined/implementation defined/unspecified. It doesn't say whether it's quoting the C or C++ standard, but I don't believe they differ on this. – Ray Aug 21 '15 at 21:19
  • Given `int i=INT_MAX; long l1,l2; i+=function_returning_one(); l1=i; second_function(); l2=i;` I would not consider it "surprising" for `l1` to yield `INT_MAX+1u` and `l2` to yield `-INT_MAX-1`; indeed, on many DSPs such behavior would be a likely result (the compiler would add 16-bit value `i` to a 32-bit accumulator, store the result in both `i` and `l2`, call `second_function();`, load `i` (16 bits), and store it to `l2`. Writing the code as `i=(int)((unsigned)i+function_returning_one());` would cause `l1` and `l2` to yield -INT_MAX-1, but would make the code less readable and less... – supercat Aug 22 '15 at 18:25
  • ...efficient. If the programmer would be perfectly happy with `l1` and `l2` holding any values congruent to 32768 mod 65536, and doesn't care if they match, I see little purpose to requiring the programmer to add clutter the source code with language which makes the function harder to read (incidentally, even if `i` were type `int16_t`, I'm not sure if the Standard would guarantee that the latter statement would work correctly if invoked with negative values, since it would from what I can tell be legal [though odd] for a compiler to use two's-complement representations for negative numbers... – supercat Aug 22 '15 at 18:27
  • ...but define unsigned-to-signed casts as yielding 32767 for all unsigned values 32768 and up. – supercat Aug 22 '15 at 18:31
3

Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently: Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)

If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.

Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)

Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.

If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.

Edit: (In respons to OPs edit) Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.

That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

Allen
  • 927
  • 8
  • 19
  • But (as mentioned in comments on various other answers already), the standard *also* defines "implementation-defined behavior"--which provides just as much flexibility for optimization. – Kyle Strand Aug 21 '15 at 19:58
  • A good point. Perhaps I should have said that Processor C occasionally generates nasal demons. Lets pretend that the energy discharged from a nasal nanobot only has a certain probability of creating a nasal demon, say 50%. Now, in this hypothetical universe, the "undefined behavior" allows a more speed optimized processor to be manufactured while still remaining standards compliant. If implementation-defined behavior were required, this would not be the case, as "50% chance of demon" probably doesn't make the cut as "something consistent" – Allen Aug 21 '15 at 20:19
  • And yet it's still a far cry from "insert malicious code," which is *also* allowed by the existing definition of UB. – Kyle Strand Aug 21 '15 at 20:21
  • 1
    It was probably just easier to say "you can do whatever you want" as opposed to trying to list off all of the things that you can and can't do. Sure, on the PC platform you typically generate nasal demons from an external USB device... that probably won't happen by accident with an electronic computer... but it might accidentally happen on a Turing complete Ouija board. Not all computers will necessarily be electronic, so not all nasal demons must be from intentionally malicious code. Some could just be from unsafe code. – Allen Aug 21 '15 at 20:32
  • If you are actually interested in some of the historical reasons for and uses of UB, see supercat's answer and comments on other answers. Also note that my question is not actually asking for a rationale behind UB, although I *do* happen to think that it ought to be removed from the standard. – Kyle Strand Aug 21 '15 at 20:34
  • I was merely trying to answer why it would allow for something as absurd as nasal demons... – Allen Aug 21 '15 at 20:36
  • 1
    @KyleStrand: Write correct C code and nothing will go wrong. The standard shouldn't change. If you do want particular behavior, compilers have been growing options and intrinsics to do what you want explicitly. C is about fast code. I recommend Java, C#, Go, etc. for hand holding. – Zan Lynx Aug 21 '15 at 22:05
  • @ZanLynx "Just don't mess up" is unacceptable as advice. Developers are human beings, and human beings are imperfect. I don't think it's "hand-holding" for a developer's tools to help them avoid things like, say, accidentally launching nuclear missiles (an extremely contrived example of potential UB, but one given as an example in one of the links posted in the comments beneath my question) due to an honest mistake--and yes, the only times I have invoked UB have been honest mistakes. – Kyle Strand Aug 21 '15 at 22:17
  • @KyleStrand: As I said, if you want hand holding go to another language. C is barely higher than assembly with all the power and peril that implies. – Zan Lynx Aug 21 '15 at 22:18
  • @ZanLynx Sure, **C** is barely higher than assembly (though that does *not* imply that UB is *necessary* to get the same benefits in terms of optimization and flexibility!). But **C++** is a supposedly "high-level" language. This *could* mean that it offers some protection from bad behavior via compile-time checks, and indeed this is *sometimes* what happens. But in practice, it also means that the pitfalls are much more subtle than they are in C, which I think is a bad thing. – Kyle Strand Aug 21 '15 at 22:20
  • @ZanLynx And sure, maybe you think it's reasonable to use a language that's so full of cat-killing and black holes (more examples stolen from comments), because *you* have a lot of experience, you're very careful, you have a deep understanding of the language, etc, etc. But can you guarantee that all of your co-workers are equally good at avoiding UB? Can you guarantee every library you use has no UB? Can you guarantee that if you ever have a rough night of sleep and wake up tired but you have some code that needs to get finished, that you'll be as good at avoiding UB as you usually are? – Kyle Strand Aug 21 '15 at 22:22
  • @ZanLynx Java, C#, Go, etc have runtime overhead that may be considered unacceptable. But what about a language like Rust, which offers the same flexibility and high-level compile-time abstractions as C++, but has much better safety guarantees? Is *that* too much "hand-holding"? What's wrong with hand-holding, anyway? – Kyle Strand Aug 21 '15 at 22:24
  • 1
    @ZanLynx: Assembly language is less error-prone than modern C. In assembly language, if memory location which held a no-longer valid pointer should hold null, one can safely test for that with something like `ldr r1,[r0] / cmp r1,#0 / bne oops` and know the assembler won't do anything weird. In a sensible C compiler for most platforms, `assert(*q==null);` should be safe. If `q` isn't null, either the assertion will fail, terminating the program, or the system will detect that `q` is an invalid pointer and terminate the program. Hyper-modern C, however, believes that if the compiler... – supercat Aug 22 '15 at 14:50
  • 1
    ...determines that `q` can't be non-null without the comparison invoking UB, it should not only remove the comparison, but it should also remove other code which it recognizes as having no usefulness outside such cases, possibly causing behaviors even worse than those the assertion was designed to protect against. – supercat Aug 22 '15 at 14:53
  • 1
    @supercat I'm glad I asked this question if for no other reason than to indirectly inspire all your comments. – Kyle Strand Aug 22 '15 at 22:36