85

Let's say I go to compile some poorly-written C++ source code that invokes undefined behavior, and therefore (as they say) "anything can happen".

From the perspective of what the C++ language specification deems acceptable in a "conformant" compiler, does "anything" in this scenario include the compiler crashing (or stealing my passwords, or otherwise misbehaving or erroring-out at compile-time), or is scope of the undefined-behavior limited specifically to what can happen when the resulting executable runs?

Benjamin Hodgson
  • 42,952
  • 15
  • 108
  • 157
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • 22
    "UB is UB. Live with it"... No wait. "Please post a MCVE." ... No wait. I love the question for all the reflexes it triggers inappropriatly. :-) – Yunnosch Aug 26 '19 at 06:37
  • 14
    There's really no limitation, which is the reason it's said that UB can summon [nasal demons](http://www.catb.org/jargon/html/N/nasal-demons.html). – Some programmer dude Aug 26 '19 at 06:37
  • 15
    UB can make the author post a [question](https://stackoverflow.com/questions/57652799/is-it-legal-for-source-code-containing-undefined-behavior-to-crash-the-compiler) on SO. :P – Tanveer Badar Aug 26 '19 at 06:42
  • 1
    @Someprogrammerdude I may argue though that UB is a *behavior*, so code isn't malformed, as only a successfully compiled program would have a behavior. Thus the abort of compilation is not a legal outcome :P – Swift - Friday Pie Aug 26 '19 at 06:51
  • 46
    Irrespective of what the C++ standard says, if I was a compiler writer I would certainly regard it as a bug in my compiler. So if you are seeing this, file a defect report. – john Aug 26 '19 at 06:53
  • 1
    @Swift-FridayPie Note that the C++ standard technically only prescribes behaviour of C++ *implementations*. It says that because that's often easier to express as behaviour of C++ programs, it's often stated like that, but it's supposed to be read as "when processing such a program, the implementation shall do such and such." – Angew is no longer proud of SO Aug 26 '19 at 06:53
  • 4
    Very close question [Dividing by zero in a constant expression](https://stackoverflow.com/a/33916186/1708801) – Shafik Yaghmour Aug 26 '19 at 12:53
  • 8
    On my first C project I came up with a particular odd bit of C code. When I attempted to compile it, the compiler didn't simple crash, it actually triggered a reboot of the workstation I was using. This behavior was consistent and repeatable. I filled a problem report and replaced the code with a more reasonable approach that I should have used in the first place. – Avi Berger Aug 26 '19 at 15:46
  • 3
    @AviBerger what code (and what compiler?) – Leif Willerts Aug 26 '19 at 16:00
  • 9
    @LeifWillerts This was back in the 80s. I don't remember the exact construct, but think it hinged on using a convoluted variable type. After I put in a replacement I had a "what was I thinking - things don't work that way" moment. I didn't blame the compiler for rejecting the construct, just for rebooting the machine. I doubt anyone would encounter that compiler today. It was the HP C cross compiler for the HP 64000 targeting the 68000 microprocessor. – Avi Berger Aug 26 '19 at 17:42
  • 2
    Given that C++ templates are Turing-complete... – chrylis -cautiouslyoptimistic- Aug 26 '19 at 23:31
  • If by "legal" you mean "permitted according to the standard", then it is legal for code with undefined behaviour to crash the internet and then crash the compiler. The definition of "undefined" means that the standard does not constrain (i.e. prevent) anything from happening, no matter how unlikely, as a result of that code. If something is not prevented by the standard, then it is "legal". – Peter Aug 27 '19 at 10:22

4 Answers4

74

The normative definition of undefined behavior is as follows:

[defns.undefined]

behavior for which this International Standard imposes no requirements

[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. Evaluation of a constant expression never exhibits behavior explicitly specified as undefined.  — end note ]

While the note itself is not normative, it does describe a range of behaviors implementations are known to exhibit. So crashing the compiler (which is translation terminating abruptly), is legitimate according to that note. But really, as the normative text says, the standard doesn't place any bounds for either execution or translation. If an implementation steals your passwords, it's not a violation of any contract laid forth in the standard.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • 43
    That said, if you can *actually* get a compiler to execute arbitrary code at compile-time, without any sandboxing, then various security people would be *very* interested to know about it. The same goes for segfaulting the compiler. – Kevin Aug 26 '19 at 15:58
  • 67
    Ditto for what Kevin said. As a C/C++/etc compiler engineer in a previous career, our position was that undefined behavior could crash _your program_, screw up your output data, set your house on fire, whatever. But the compiler should never crash no matter what the input. (It might not give helpful error messages, but it should produce some kind of diagnostic and exit rather than just screaming CTHULHU TAKE THE WHEEL and segfaulting.) – Ti Strga Aug 26 '19 at 18:21
  • 8
    @TiStrga I bet Cthulhu would make an awesome F1 driver. – zeta-band Aug 26 '19 at 19:21
  • 35
    "If an implementation steals your passwords, it's not a violation of any contract laid forth in the standard." That's true regardless of whether the code has UB though, isn't it? The standard only dictates what the compiled program should do - a compiler that correctly compiles the code but steals your passwords in the process wouldn't be disobeying the standard. – Carmeister Aug 26 '19 at 21:36
  • 8
    @Carmeister, oooh, that's a good point, I'll make sure to remind people of that whenever those "UB gives the compiler permission to start a nuclear war" arguments pop up. Again. – ilkkachu Aug 27 '19 at 05:46
  • UB doesn't truly *happen* until execution. Is the phase "behaving during translation" referring to UB encountered during templates or constexpr expansion? That's not a thing for C, so the answer is clear for C, IMO: the compiler isn't allowed to crash when compiling a well-formed program (which I argue in my answer to this question). – Peter Cordes Aug 27 '19 at 10:39
  • 1
    @PeterCordes - Undefined behavior is a property of the *implementation*, since that's that thing whose behavior the standard describes. This includes translation. – StoryTeller - Unslander Monica Aug 27 '19 at 10:41
  • Does crashing the compiler really count as termination "with the issuance of a diagnostic message"? Of course that note isn't normative. Anyway, thanks for the correction, updated my answer to clarify that what I said is only true for runtime UB such as division by zero. (There's a valuable point about unreachable UB and the limits of insanity that I think many people don't grok.) – Peter Cordes Aug 27 '19 at 11:06
  • Can you clarify what kinds of UB actually are translation-time UB? Like exceeding implementation limits on line length or object size? Or is the compiler really allowed to choke on `if(0) 1 / 0;`? – Peter Cordes Aug 27 '19 at 11:08
  • @PeterCordes - It's in parentheses. So I took it as optional. – StoryTeller - Unslander Monica Aug 27 '19 at 11:08
  • That's not how I read it. The previous parens say "with or without". There's still the point that it's not normative; I assume they don't want to *encourage* bad-quality implementations and that clause is only talking about translation-time behaviour. – Peter Cordes Aug 27 '19 at 11:10
  • @PeterCordes - From a practical standpoint, no I don't believe compilers choking is acceptable. That's a bug. But on the other hand, UB and bugs usually go hand in hand. And finally, since this was tagged LL, so from a pure LL perspective, yes it's acceptable. Not good, but the standard doesn't object. – StoryTeller - Unslander Monica Aug 27 '19 at 11:14
  • Woah there, compiler bugs are very different and totally separate from the source being compiled. ISO C++ clearly prohibits most compilers bugs; valid programs are required to work for all inputs up to implementation limits. Anyway, yes I agree that ISO C++ allows compilers to choke on translation-time UB, but what exactly is that? Does the standard shed any light on which kinds of UB are which? It also seems fairly clear (to me) that division by zero that's not part of an expression like `x / 0` *must* not stop a program from compiling. Especially if it's not provably reached from `main`. – Peter Cordes Aug 27 '19 at 11:30
  • @PeterCordes - Considering there's a whole slew of behavior that's undefined by mere omission of a definition, I don't believe the standard has the answer you seek. As for compilers... Nobody is really bothered by UB causing compiler bugs, because we don't like our programs (or tools) misbehaving in general, and quite reasonably so. I call them bugs, because they get fixed. – StoryTeller - Unslander Monica Aug 27 '19 at 11:34
  • Compilers sometimes crash when optimizing *valid* code as well. I don't think UB causing a crash is much different than other compiler-crash bugs. Everyone agrees that both are highly undesirable. But from a language-lawyer POV the important question is whether it's "just" a QoI bug or whether it's a standards-violation bug. – Peter Cordes Aug 27 '19 at 11:54
  • @PeterCordes - Given the highly loose contract in the normative text, I think it's safe to say to say this is a QoI bug. The standard cares about valid programs being translated successfully (within an implementation's limits, of course). But for invalid programs whose issues are not diagnosable, not so much from what I gather. – StoryTeller - Unslander Monica Aug 27 '19 at 12:03
  • @Carmeister : Is it sufficient that the compiler behaves as if it steals your passwords in the process? – Eric Towers Aug 27 '19 at 14:54
  • @PeterCordes: From a language-lawyer perspective, nothing an otherwise-conforming implementation does when fed a program which doesn't precisely exercise the translation limits given in the Standard, could ever render it non-conforming. If some source text T would precisely exercise the translation limits, and an implementation was incapable of handling any source text other than T, the implementation would be conforming if and only if it processed T correctly. That's the only scenario where the Standard would impose any requirements on how an implementation processes any particular program. – supercat Dec 08 '20 at 20:55
7

Most kinds of UB that we usually worry about, like NULL-deref or divide by zero, are runtime UB. Compiling a function that would cause runtime UB if executed must not cause the compiler to crash. Unless maybe it can prove that the function (and that path through the function) definitely will be executed by the program.

(2nd thoughts: maybe I haven't considered template / constexpr required evaluation at compile time. Possibly UB during that is allowed to cause arbitrary weirdness during translation even if the resulting function is never called.)

The behaving during translation part of the ISO C++ quote in @StoryTeller's answer is similar to language used in the ISO C standard. C doesn't include templates or constexpr mandatory eval at compile time.

But fun fact: ISO C says in a note that if translation is terminated, it must be with a diagnostic message. Or "behaving during translation ... in a documented manner". I don't think "ignoring the situation completely" could be read as including stopping translation.


Old answer, written before I learned about translation-time UB. It's true for runtime-UB, though, and thus potentially still useful.


There's no such thing as UB that happens at compile time. It can be visible to the compiler along a certain path of execution, but in C++ terms it hasn't happened until execution reaches that path of execution through a function.

Defects in a program that make it impossible to even compile aren't UB, they're syntax errors. Such a program is "not well-formed" in C++ terminology (if I have my standardese correct). A program can be well-formed but contain UB. Difference between Undefined Behavior and Ill-formed, no diagnostic message required

Unless I'm misunderstanding something, ISO C++ requires this program to compile and execute correctly, because execution never reaches the divide by zero. (In practice (Godbolt), good compilers just make working executables. gcc/clang warn about x / 0 but not this, even when optimizing. But anyway, we're trying to tell how low ISO C++ allows quality of implementation to be. So checking gcc/clang is hardly a useful test other than to confirm I wrote the program correctly.)

int cause_UB() {
    int x=0;
    return 1 / x;      // UB if ever reached.
 // Note I'm avoiding  x/0  in case that counts as translation time UB.
 // UB still obvious when optimizing across statements, though.
}

int main(){
    if (0)
        cause_UB();
}

A use-case for this might involve the C preprocessor, or constexpr variables and branching on those variables, which leads to nonsense in some paths that are never reached for those choices of constants.

Paths of execution that cause compile-time-visible UB can be assumed to be never take, e.g. a compiler for x86 could emit a ud2 (cause illegal instruction exception) as the definition for cause_UB(). Or within a function, if one side of an if() leads to provable UB, the branch can be removed.

But the compiler still has to compile everything else in a sane and correct way. All paths that don't encounter (or can't be proved to encounter) UB must still be compiled to asm that executes as-if the C++ abstract machine was running it.


You could argue that unconditional compile-time-visible UB in main is an exception to this rule. Or otherwise compile-time-provable that execution starting at main does in fact reach guaranteed UB.

I'd still argue that legal compiler behaviours include producing a grenade that explodes if run. Or more plausibly, a definition of main that consists of a single illegal instruction. I'd argue that if you never run the program, there hasn't been any UB yet. The compiler itself isn't allowed to explode, IMO.


Functions containing possible or provable UB inside branches

UB along any given path of execution reaches backwards in time to "contaminate" all previous code. But in practice compilers can only take advantage of that rule when they can actually prove that paths of execution lead to compile-time-visible UB. e.g.

int minefield(int x) {
    if (x == 3) {
        *(char*)nullptr = x/0;
    }

    return x * 5;
}

The compiler has to make asm that works for all x other than 3, up to the points where x * 5 causes signed-overflow UB at INT_MIN and INT_MAX. If this function is never called with x==3, the program of course contains no UB and must work as written.

We might as well have written if(x == 3) __builtin_unreachable(); in GNU C to tell the compiler that x is definitely not 3.

In practice there's "minefield" code all over the place in normal programs. e.g. any division by an integer promises the compiler that it's non-zero. Any pointer deref promises the compiler that it's non-NULL.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
4

What does "legal" mean here? Anything that doesn't contradict the C standard or C++ standard is legal, according to these standards. If you execute a statement i = i++; and as a result dinosaurs take over the world, that doesn't contradict the standards. It does however contradict the laws of physics, so it's not going to happen :-)

If undefined behaviour crashes your compiler, that doesn't violate the C or C++ standard. It does however mean that the quality of the compiler could (and probably should) be improved.

In previous versions of the C standard, there were statements that were errors or not dependent on undefined behaviour:

char* p = 1 / 0;

Assigning a constant 0 to a char* is allowed. Allowing a non-zero constant is not. Since the value of 1 / 0 is undefined behaviour, it is undefined behaviour whether the compiler should or should not accept this statement. (Nowadays, 1 / 0 does not meet the definition of "integer constant expression" anymore).

amalloy
  • 89,153
  • 8
  • 140
  • 205
gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 4
    To be precise: dinosaurs taking over the world does not contradict any laws of physics (e.g. Jurassic Park variation). It's just highly unlikely. :) – freakish Aug 27 '19 at 08:36
-1

The Standard would impose no requirements upon an implementation's behavior if it encounters #include "'foo'". If compiler writer judges that it would be useful to process include directives of that form (containing the apostrophes within the file name) by running the indicated program with its output directed to a temporary file and then behaving as a #include of that file, then an attempt to process a program containing the above line could run program foo, with whatever consequences result.

Thus, there is in general no limit as to what might happen as a consequence of trying to translate a C program, even if one makes no effort to run it.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • One could say the same thing about any translator or compiler in any programming language. Or, for that matter, any program whatsoever. – Robert Harvey Dec 02 '20 at 17:13
  • @RobertHarvey: Many programming language specifications are much more specific about such things. If a language spec says that a certain directive will read input from a stream whose OS path is as specified, and the OS does something weird when reading a certain path, that would be outside the control of the language spec, but I don't think most language specs would give implementations carte blanche to process such directives in arbitrary fashion at their leisure, without having to document it, even on platforms which would otherwise define the behavior. – supercat Dec 02 '20 at 19:44