17

Certain common programming languages, most notably C and C++, have the strong notion of undefined behaviour: When you attempt to perform certain operations outside of the way they are intended to be used, this causes undefined behaviour.

If undefined behaviour occurs, a compiler is allowed to do anything (including nothing at all, 'time traveling', etc.) it wants.

My question is: Why does this notion of undefined behaviour exist? As far as I can see, a huge load of bugs, programs that work one one version of a compiler stop working on the next, etc. would be prevented if instead of causing undefined behaviour, using the operations outside of their intended use would cause a compilation error.

Why is this not the way things are?

Qqwy
  • 5,214
  • 5
  • 42
  • 83
  • 7
    Pretty much this reference is the go to for UB: [What every C programmer should know about UB](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html) – Mike Vine Jul 27 '18 at 12:25
  • 2
    Because of the ideology of the C. Very flexible and powerful leaving everything in the programmers hands – 0___________ Jul 27 '18 at 12:25
  • 2
    Interesting talk on the topic by Chandler Carruth: [Garbage In, Garbage Out: Arguing about Undefined Behavior...](https://www.youtube.com/watch?v=yG1OZ69H_-o) – Borgleader Jul 27 '18 at 12:31
  • 2
    "using the operations outside of their intended use would cause a *compilation error*" Most of the undefined behaviours in C aren't statically detectable, so they can't be compilation errors. They'd have to be run time errors, which would come with a run time cost. – sepp2k Jul 27 '18 at 12:32
  • Even though an interesting and important topic, it is too broad. There has been countless discussion and research going into this exact question, and still there are languages ranging from having no UB at all, to having UB all over the place (ahem, C/C++). – Passer By Jul 27 '18 at 13:19
  • Undefined behavior may stem from differences in platforms. Often, embedded systems need to access features (via pointers) that desktops would prevent access to. Also, there is no standard memory layout for all the platforms. Small embedded systems would not support the same addressing range of desktops or more powerful platforms. – Thomas Matthews Jul 27 '18 at 14:23
  • UB exists to allow for systems to only pay for what they use; no need to waste resources on prevention (e.g. Java & C#). For example, on an embedded system that doesn't use dynamic memory allocation, running a garbage collection service is unnecessary. Also, in timing critical platforms, random garbage collection is a bad thing. – Thomas Matthews Jul 27 '18 at 14:27
  • @ThomasMatthews: The reasons for many forms of UB are historical. Unfortunately, compiler writers who don't understand the difference between "non-portable" and "erroneous", and who think "clever" and "silly" are antonyms, have latched onto it for far more destructive purposes. – supercat Jul 28 '18 at 20:25
  • There is a recent article called [The Value of Undefined Behaviour](https://nullprogram.com/blog/2018/07/20/) that gives some good examples. Well worth checking out! – Qqwy Aug 15 '18 at 10:41
  • I know I'm a little late to the party, but _most_ if not all undefined behaviour in C/C++ can't be caught at compile time in the general case. Things like array out of bounds errors for example or use after free. – CoffeeTableEspresso Jun 28 '19 at 23:07
  • @CoffeeTableEspresso The interesting thing is that e.g. Rust tries to catch those kinds of errors. Of course, the semantics of Rust are not 1:1 comparable to C/C++. It is very true that it is not possible for 'the general case', which means that (safe) Rust is more conservative/restrictive in how you can assign/modify memory. – Qqwy Jun 29 '19 at 23:09
  • @Qqwy this is why I prefer C/C++/D to Rust. I'd much rather my compiler not be as restrictive, and just use a static analysis tool to catch any errors. Instead of having my compiler stop me from doing lots of valid things that _might_ be errors. – CoffeeTableEspresso Jun 30 '19 at 01:53

3 Answers3

18

Why does this notion of undefined behaviour exist?

To allow the language / library to be implemented on a variety of different computer architectures as efficiently as possible (- and perhaps in the case of C - while allowing the implementation to remain simple).

if instead of causing undefined behaviour, using the operations outside of their intended use would cause a compilation error

In most cases of undefined behaviour, it is impossible - or prohibitively expensive in resources - to prove that undefined behaviour exists at compile time for all programs in general.

Some cases are possible to prove for some programs, but it's not possible to specify which of those cases are exhaustively, and so the standard won't attempt to do so. Nevertheless, some compilers are smart enough to recognize some simple cases of UB, and those compilers will warn the programmer about it. Example:

int arr[10];
return arr[10];

This program has undefined behaviour. A particular version of GCC that I tested shows:

warning: array subscript 10 is above array bounds of 'int [10]' [-Warray-bounds]

It's hardly a good idea to ignore a warning like this.


More typical alternative to having undefined behaviour would be to have defined error handling in such cases, such as throwing an exception (compare for example Java, where accessing a null reference causes an exception of type java.lang.NullPointerException to be thrown). But checking for the pre-conditions of well defined behaviour is slower than not checking it.

By not checking for pre-conditions, the language gives the programmer the option of proving the correctness themselves, and thereby avoiding the runtime overhead of the check in a program that was proven to not need it. Indeed, this power comes with a great responsibility.

These days the burden of proving the program's well-definedness can be somewhat alleviated by using tools (example) which add some of those runtime checks, and neatly terminate the program upon failed check.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Well said! Platform efficiency and code portability have strongly influenced and shaped C and C++. Some of C++11 enhancements, such as move semantics, was to address a shortcoming in potential efficiency. But both efficiency and portability are hard to achieve without giving compilers a lot of leeway... undefined behavior. Other languages that have less undefined behavior can be less performant (sometimes much less performant). It's a tradeoff, and different languages have different goals. Languages are tools, suitable in their domain. – Eljay Jul 27 '18 at 12:44
  • I'd give +2 if I could. I'd add that newer (versions of) languages try to minimize scope of UB by adding more explicit rules to the language (take Rust and move semantics in C++11 for example) – bartop Jul 27 '18 at 13:05
  • Another common alternative for undefined behavior is to specify a result consistent with the natural behavior of many target platforms. For example, Java specifies that 65535*65537 will wrap around in such fashion as to yield -1, and a shift expression like 1 << 35 will reduce the shift amount mod 32 (yielding 3) before doing the shift, which will thus yield 8. – supercat Jul 27 '18 at 20:30
  • A classic example for platform inconsistencies is out-of-bounds shift: `auto undef_shift(std::uint32_t v) { return v << 32; }`. On some architectures, the corresponding left shift instruction traps, on others, it always returns zero (as it is treated as shifting out all bits), and on yet others, it would behave the same as `return v;` because the higher bits of the second operand are silently ignored (this applies to x86). Had they mandated any particular behavior, all other platforms would be heavily penalized by additional sanitization/checking code the compiler would need to emit. – Arne Vogel Jul 28 '18 at 11:10
  • @supercat Regarding mod-reducing the shift operand, if by common you mean x86, you are correct. ARM on the other hand will saturate with zeroes, and I don't need to tell you how many billions of ARM devices are in circulation nowadays, 23 years after Java was created. The same goes for IA-64 (for register shift width operand). Java places security and portability over maximum performance, which is a valid choice, by all means – I'm just saying performance is the exact reason C/C++ didn't go down that route. – Arne Vogel Jul 28 '18 at 11:30
  • 1
    @ArneVogel: Java specifies mod-32 reduction, regardless of the architecture. Java implementations on an ARM are required to add an AND operation if they cannot verify that the operand is within 0..31. On the other hand, having a language specify that the result would be an unspecified choice between `x<<(y-1)<<1` and `x<<(y & 31)` would have allowed efficient operation on many platforms while still allowing `(x<>(32-y))` to be an efficient way to do a rotate. – supercat Jul 28 '18 at 16:37
  • Yes, this is what I wrote (I'm glad we agree on that): "Had they mandated any particular behavior, all other platforms would be heavily penalized by additional sanitization/checking code the compiler would need to emit." – Java on ARM doesn't violate the spec., but in general, code will be slower than possible (unless the shift width is known at JIL time). The rotate trick is neat, but I'd rather _finally_ have a rotate library function in C++. (This could be implemented easily as a compiler intrinsic.) – Arne Vogel Jul 28 '18 at 18:01
  • @ArneVogel: A number of compilers look for the pattern I showed for rotate and replace it with a rotate instruction, along with variations swapping the left and right operands, or the roles of left and right shifts. Those are pretty much the four simplest ways of doing a rotate on a platform were `x>>32` will either return 0 or x (it doesn't matter which). Trying to avoid UB in that case not only requires using a more complicated expression, but it becomes far less obvious what expressions a compiler should look for and replace with a rotate. – supercat Jul 28 '18 at 20:24
  • @supercat Good points – I replied [in the chat](https://chat.stackoverflow.com/rooms/177007/c-out-of-bounds-shift). – Arne Vogel Jul 30 '18 at 10:47
10

Undefined behavior exists mainly to give the compiler freedom to optimize. One thing it allows the compiler to do, for example, is to operate under the assumption that certain things can't happen (without having to first prove that they can't happen, which would often be very difficult or impossible). By allowing it to assume that certain things can't happen, the compiler can then eliminate/does not have to generate code that would otherwise be needed to account for certain possibilities.

Good talk on the topic

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
  • 1
    Is there anything in the C89 Rationale or other documentation from the 1980s to support that view, or is it a more modern invention? – supercat Jul 27 '18 at 17:11
  • This is a good answer. Undefined behavior allow the compiler to simply assume certain things never happen and can optimize accordingly. Optimizing away a check or branch that would only happen if there was undefined behavior would not be possible if the compiler was forced to check and cause an error instead. So instead the results are 'unpredictable,' just as the results of assuming something that never happens that then does happen is unpredictable. – user16217248 Jan 16 '23 at 16:03
-4

Undefined behavior is mostly based on the target it is intended to run on. The compiler is not responsible for the dynamic behavior of the program or the static behavior for that matter. The compiler checks are restricted to the rules of the language and some modern compilers do some level of static analysis too.

A typical example would be uninitialized variables. It exists because of the syntax rules of C where a variable can be declared without init value. Some compilers assign 0 to such variables and some just assign a mem pointer to the variable and leave just like that. if program does not initialize these variables it leads to undefined behavior.

techeno
  • 17
  • I think you are answering "why do compilers implement undefined behavior", but the question is "why does the language standard implement undefined behavior". – VLL Nov 18 '22 at 12:01
  • "It exists because of the syntax rules of C". This is false: "Undefined behavior" does not mean things that are not mentioned in the standard. The standard clearly states which behaviors will result to "undefined behavior". The designers of the language have already considered each situation and chosen to make it "undefined behavior". They could have as easily defined a specific behavior for these situations and left it for the compiler programmers to find out how to implement that. – VLL Nov 18 '22 at 12:01