What is the reason to extend "undefined behaviour" over the compilation phase?

Question

What is the reason to extend the UB over the compilation phase? Instead of compiling and linking a binary upon encountering UB-code and let that binary to be subject to UB? (And if it is impossible to produce the binary then just print a error message about it.)

After all we expect the most exact compilation report from the compiler even when a source code contain a UB-code (and almost every piece of a source code may contain some UB-code).

Could you please give a concrete example of such a UB-code that it really makes much more sense to allow the compiler to exhibit UB than to allow the generated binary to exhibit UB?

This question stems from this one: Does “undefined behaviour” extend to compile-time?

You say "instead of compiling and linking a binary that has UB" as if you can always produce an executable from an erroneous program... I think your question is very valid (why grant compilers the license to crash if you mess up), but that's not really a reasonable suggestion. — Max Langhof, Jan 17 '19 at 10:40
UB code always is erroneous. Not all erroneous code exhibits UB. I'm not sure what you are getting at though. — Max Langhof, Jan 17 '19 at 10:43
[Related](https://stackoverflow.com/questions/7421170/constexpr-undefined-behaviour) - possibly even duplicate? — Aconcagua, Jan 17 '19 at 10:45
Potential UB is good, that's how optimization works (think signed overflow and loops). — Matthieu Brucher, Jan 17 '19 at 10:46
Would this not require Compilers to diagnose all undefined behavior? — P.W, Jan 17 '19 at 10:48
@MaxLanghof OK. If it is impossible to produce an executable then just print a message about it. — Alexey, Jan 17 '19 at 10:49
Also, compilers can define their own rules of what they will do with a certain instance of UB, sometimes you'll see that certain behavior is a "compiler extension". — Blaze, Jan 17 '19 at 10:49
@P.W. not all UB are bad. Some are captured by UBSAN. But you want to have some in your application for optimization purposes. — Matthieu Brucher, Jan 17 '19 at 10:49
Required reading: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. THis shows that UB is really hard to even detect in a lot of cases. But also shows inroads into trying to fail like you'd want. — Mike Vine, Jan 17 '19 at 10:51
@MatthieuBrucher: Yes, agree with that. Here's a related post. [Does undefined behavior really help modern compilers to optimize generated code?](https://stackoverflow.com/questions/49001802/does-undefined-behavior-really-help-modern-compilers-to-optimize-generated-code) — P.W, Jan 17 '19 at 10:52

score 8 · Answer 1 · answered Jan 17 '19 at 10:49

8

You make it sound like dealing with "undefined behaviour" is some sort of specific action that the compiler takes. That it scans your program for lines of "undefined behaviour" and then does a thing. That accordingly it can choose at which stage of the build (or execution process) to do that thing and manifest "undefined behaviour".

It's not, and it doesn't, and it can't.

Your program has undefined behaviour if it violates a contract that the toolchain legally and usefully assumes has been upheld. The whole point of certain categories of bug causing the program to have undefined behaviour (as opposed to being ill-formed) is that the compiler doesn't need to analyse the program to look out for them (which, in many cases, would be impractical at best). It can and will just assume they're not there and go about its complex business accordingly (like this). This business involves analysis, translation and production of code that gets executed later — i.e. the contract violation is relevant to the whole lifecycle of the program.

Therefore, symptoms can manifest at any part of the lifecycle of the program, from initial parsing of the source code to execution of the translated binary. And therefore, nobody's "extended" UB and nobody's made any decision about when symptoms manifest. So there are no reasons, and there are not no reasons.

answered Jan 17 '19 at 10:49

Lightness Races in Orbit

378,754
76
643
1,055

Could you please give a concrete example of such a UB-code that it really makes much more sense to allow the compiler to exhibit UB than to allow the generated binary to exhibit UB? – Alexey Jan 17 '19 at 10:59
I'm pretty sure the standard doesn't mention neither a contract nor a toolchain.This post seems to be too specific to be quite true. – Alexey Jan 17 '19 at 11:24
4

@Alexey No, the standard does not mention any contracts, the standard IS the contract. It defines what the complier can assume about the code without analysing it. Toolchains are mentioned in the standard but the term used is "conforming implementation". – Johan Jan 17 '19 at 13:06
@Alexey "Contract" is a commonly-used way to describe the agreement that you make with the compiler/toolchain when you promise to give it valid, well-defined C++, in exchange for deterministic and meaningful results. – Lightness Races in Orbit Jan 17 '19 at 14:03
@Alexey I'm not sure I can give such an example off-hand, but that doesn't mean there is a benefit in disallowing such an example to exist. – Lightness Races in Orbit Jan 17 '19 at 14:04
I like your punchline. – molbdnilo Jan 17 '19 at 14:12
@molbdnilo You mean, you like it, and you don't not like it? :D – Lightness Races in Orbit Jan 17 '19 at 14:17
@Alexey: How about `#include \`someProg someArgs\``? An implementation might process such a construct by running someProg with the given arguments, and behaving as though the output from that program were inserted into the C source text, but such behavior would be entirely outside the jurisdiction of the C Stanrdard. – supercat May 10 '22 at 22:02

Maxim Egorushkin · Answer 2 · 2019-01-19T03:14:49.753

From C++ standard [defns.undefined]:

undefined behavior behavior for which this document imposes no requirements.

[ Note: Undefined behavior may be expected when this document omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression never exhibits behavior explicitly specified as undefined in [intro] through [cpp] of this document ([expr.const]). — end note ]

I.e. undefined behaviour is not necessarily erroneous behaviour.

Some C++ undefined behaviours are well defined by other standards a platform must satisfy.

For example, C++ doesn't define the behaviour of casting a function pointer to void*. Whereas POSIX requires this cast to be well-formed.

Another example, the C++ standard says that loading an invalid pointer is undefined behaviour. On platforms with segmented addressing loading an invalid pointer causes a hardware trap, whereas on platforms with virtual address space loading any pointer value is safe.

Most importantly, the compiler always generates code under the assumption that no undefined behaviour happens (unless it can prove otherwise at compile time). E.g. when you dereference a pointer it assumes the pointer is valid, when a signed integer is incremented it assumes that it doesn't overflow. When you break the compiler assumptions is when the undefined behaviour starts to manifest itself.

But those examples aren't really guaranteed to happen. If the C++ standard says something is UB then the compiler is free to assume it won't happen. Or eliminate that code if it can prove that it would result in UB. Just that you wrote an invalid pointer dereference doesn't mean you will encounter a hardware trap, even if the platform documents it. Or am I wrong here? — Max Langhof, Jan 17 '19 at 11:54
@MaxLanghof The C++ standard has long been under-specified. It specifies the bare minimum it needs and leaves the rest undefined. There are cases where the compiler is free to assume it won't happen (like a null reference), but in other cases the compiler cannot assume that. That is why it cannot print diagnostics and assume it won't happen in a general case. For example, a non-inline function dereferences a pointer argument - that can be UB for some pointer values, but that is not known at compile time - the compiler assumes you know what you are doing and there is no UB. — Maxim Egorushkin, Jan 17 '19 at 12:00
@MaxLanghof _invalid pointer dereference_ - I am talking about loading an invalid pointer value, not dereferencing it. — Maxim Egorushkin, Jan 17 '19 at 12:06
@MaxLanghof Generally, the compiler assumes the UB doesn't happen (i.e. you don't overflow signed integers, no invalid pointer dereference) and generates correct code. You are confusing cases where the UB does happen but the compiler didn't expect that. — Maxim Egorushkin, Jan 17 '19 at 12:08

P.W · Answer 3 · 2019-01-17T11:15:46.600

1

Some reasons why UB is extended beyond compile time:

Undefined behavior is hard to detect in many cases
Undefined behavior is sometimes caused by the erroneous data at runtime
Undefined behavior allows for certain advantages such as optimization
The standard does not mandate when the effects of Undefined behavior should manifest

edited Jan 17 '19 at 11:15

answered Jan 17 '19 at 11:09

P.W

26,289
6
39
76

Could you please explain how does the first reason lead to UB during compilation? Compiler is just an application and it must accept every possible input and provide the deterministic output. – Alexey Jan 17 '19 at 11:12
In many cases not ALL. This means that in those cases UB will not manifest during translation. – P.W Jan 17 '19 at 11:13
Yes. Go ahead please – Alexey Jan 17 '19 at 11:28
I have actually taken that example from this [post](https://stackoverflow.com/a/7239071/10190237) which argues that this UB will escape detection from a hypothetical UB detector. – P.W Jan 17 '19 at 11:44
@P.W interestingly Clang optimizes everything away here. Since `*p` is UB once `p` has been deleted, the whole `if` is discarded, then heap elision kicks in and removes the non-observable `new`s and `delete`. The result is a perfectly conforming `return 0;` :) – Quentin Jan 17 '19 at 13:47
@Quentin: I think it is to give compilers this flexibility the standard does not mandate that the effects of UB show up at compile time itself. – P.W Jan 18 '19 at 04:35

Max Langhof · Answer 4 · 2019-01-17T12:08:06.860

Consider this: C++ compilers can (e.g. constexpr ALL the things!) and, to some degree, always have been doing optimizations/calculations at compile-time. It should not be surprising that something that leads to UB while running some compiled C++ code can also lead to UB when done at compile-time.

To summarize the comments below: C++ basically always had (compile-time) optimizations. UB helps optimizations in that the compiler is free to assume no UB is present for proving the validity of its optimizations. Compiler writers are not required to ensure their optimizations (for code generation or for constexpr evaluation or anything in between) detect or are robust in the face of UB. Hence, any compile-time optimizations/calculations on code with UB may result in compile-time UB

Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186854/discussion-between-daniel-langr-and-alexey). — Daniel Langr, Jan 17 '19 at 12:41

Daniel Langr · Answer 5 · 2019-01-17T13:45:47.353

-1

The example you are looking for is any kind of compile-time calculation that causes UB, e.g., an overflow of a signed integer. In can happen, for instance, when constexpr, template metaprogramming, or optimizations are involved. There is no reason why such UB should be propagated into runtime. An example:

template <signed char N>
struct inc {
  static const signed char value = 1 + inc<N + 1>::value;
};

template <>
struct inc<-100> {
  static const signed char value = 1;
};

static const signed char I1 = inc<-110>::value;
static const signed char I2 = inc<110>::value;  // UB

The signed integer overflow — and therefore UB — obviously happens at compile time, when templates are recurrently instantiated.

Anyway, IMO, the main reason is simplicity. There is only one UB defined by the Standard. Which is more simple than define many kinds of UB and then say which one applies in which situation.

By pure logic, if a cause of UB doesn't exist until runtime (such as dereferencing an invalid pointer), then UB cannot apply to compile time. Such as when compiling the following source file:

#include <iostream>
void(int* p) { std::cout << *p; }

UPDATE

My understanding of UB is as follows: If a condition for UB is met, then there is no requirement for the behavior. See [defns.undefined]. Some such conditions (signed integer overflow) can happen at compile time as well as at runtime. Another conditions (dereferencing an invalid pointer) cannot happen until runtime.

edited Jan 17 '19 at 13:45

answered Jan 17 '19 at 12:29

Daniel Langr

22,196
3
50
93

Your example is too special to answer the question in general. C++ hasn't always had the possibility to carry out compile-time computations. – Alexey Jan 17 '19 at 12:32
Justify it then:) – Alexey Jan 17 '19 at 12:33
@Alexey Prove the opposite, please. C++ is and always has been a language targeted to performance and efficiency. You cannot have them without optimizations. – Daniel Langr Jan 17 '19 at 12:34
It's just your opinion - C++ Standard doesn't require any optimizations. – Alexey Jan 17 '19 at 12:41
@Alexey The standard is written with optimizations in mind, all throughout. No, it doesn't "require" optimizations (otherwise every unoptimized build would not be standard-compliant), but let's not pretend it is "just an opinion" that the standard considers optimizations. If you want a concrete example, see the discussion around http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r0.html regarding signed/unsigned overflow. – Max Langhof Jan 17 '19 at 12:44
@DanielLangr What does it mean that "a cause of UB doesn't exist until runtime"? Why do you think that optimizer can't take your function void(int* p) { std::cout << *p; } and execute it during compilation and get UB during compilation (and dereference a deleted pointer during compilation)? – Alexey Jan 17 '19 at 13:32
@Alexey There is no deleted pointer. A compiler cannot execute this function during compilation, because in the source file (and a corresponding translation unit), there is no function call. It's only a function definition with external linkage, so a compiler must generate a machine code for it. – Daniel Langr Jan 17 '19 at 13:36
@DanielLangr Why do you think that "dereferencing an invalid pointer" can not happen until runtime? There maybe such an implementation of C++ and such an optimizing compiler. The standard doesn't impose such a restriction. – Alexey Jan 17 '19 at 13:49
@Alexey Because in this particular case, there is no call of `f`. Do you understand what does it mean? That there is no argument and therefore no pointer to dereference, when you compile this single source file. – Daniel Langr Jan 17 '19 at 14:06

What is the reason to extend "undefined behaviour" over the compilation phase?

5 Answers5