46

The C++ standard comes with an stunning number of definitions for unclear1 behavior which mean more or less the same with subtle differences. Reading this answer, I noticed the wording "the program is ill-formed; no diagnostic required".

Implementation-defined differs from unspecified behavior in that the implementation in the former case must clearly document what it's doing (in the latter case, it needn't), both are well-formed. Undefined behavior differs from unspecified in that the program is erroneous (1.3.13).
They otherwise all have in common that the standard makes no assumptions or requirements about what the implementation will do. Except for 1.4/8, which states that implementations may have extensions that do not alter the behavior of well-formed programs, but are ill-formed according to the standard, and the implementation must diagnose use of these, but can afterwards continue compiling and executing the ill-formed program.

An ill-formed program is otherwise only defined as being not well-formed (great!). A well-formed program, on the other hand, is defined as one that adheres to the syntax and diagnosable semantic rules. Which would consequently mean that an ill-formed program is one that breaks either the syntax or semantic rules (or both). In other words, an ill-formed program actually shouldn't compile at all (how would one translate e.g. a program with a wrong syntax in any meaningful way?).

I would be inclined to think that the word erroneous also implies that the compiler should abort the build with an error message (after all, erroneous suggests there's an error), but the "Note" section in 1.3.13 explicitly allows for something different, including silently ignoring the problem (and compilers demonstrably do not break the build because of UB, most do not even warn by default).

One might further believe that erroneous and ill-formed are the same, but the standard doesn't go into detail if that is the case or what the word is supposed to mean.

Further, 1.4 states that

a conforming implementation shall [...] accept and correctly execute a well-formed program

and

If a program contains a violation of a rule for which no diagnostic is required, [...] no requirement on implementations with respect to that program.

In other words, a conforming implementation must accept a well-formed program, but it might as well accept an ill-formed one, and even without a warning. Except, if the program is ill-formed because it uses an extension.

The second paragraph suggests that anything in conjunction with "no diagnostic required" means there are no requirements from the specification, which means it is mostly equivalent to "undefined behavior", except there is no mention of erroneous.

What would therefore be the intention behind using a wording such as "ill-formed; no diagnostic required"?

The presence of "no diagnostics" would suggest that it is identical (or mostly identical?) to undefined behavior. Also, since implementation-defined and unspecified behavior are defined as well-formed, it must be something different.

On the other hand, since an ill-formed program breaks the syntax/semantic rules, it actually should not compile. Which, however, in conjunction with "no diagnostic required" would mean that a compiler would be permitted to silently exit without as much as a warning, and you would be unable to find an executable afterwards.

Is there a difference between "ill-formed; no diagnostic required" and "undefined behavior", or is this simply a complicated synonym for the same thing?


1In lack of a better wording for the collective of behaviors
Community
  • 1
  • 1
Damon
  • 67,688
  • 20
  • 135
  • 185
  • See also this [question](http://stackoverflow.com/questions/15805394/what-is-the-c-compiler-required-to-do-with-ill-formed-programs-according-to-th?rq=1) helpfully listed in the sidebar. – vonbrand Mar 04 '14 at 19:01
  • 1
    @vonbrand: The highest ranked answer seems to conclude that "ill-formed" means "diagnostic message plus undefined behavior" following the same reasoning as in my question that 1.4.2 has a similar wording as 1.3.13 (but does that really mean it's the same?). If we assume that this reasoning is correct, then "ill-formed; no diagnostic" would be "diagnostic plus undefined behavior minus diagnostic", so simply "undefined behavior". – Damon Mar 04 '14 at 19:11
  • Ironically, I just realized that the very example in the question I linked to (the one containing the quote that made me wonder) must **necessarily** break the build: _"if no function argument values exist such that the function invocation substitution would produce a constant expression"_. If you need a compiletime-constant (say, for a template parameter) and you cannot produce one from your inputs, what can you do but immediately abort with an error? Surely, "no diagnostic necessary" cannot really apply for that case. – Damon Mar 04 '14 at 19:20
  • 2
    no. *Doing* `char *p = malloc(100); free(p); *p = 10;` is undefined behaviour. The program containing those lines can be well-formed, specially if some other constraints never allow that exact sequence of events. It seems that the only case of clear-cut ill-formedness comes from the "one definition" rule. – vonbrand Mar 04 '14 at 19:41
  • "_a conforming implementation must accept a well-formed program_" up to its implementation "limits". So it's a very weak "must". An implementation _should_ have limits at least as big as specified in the std, but it isn't a strict requirement. – curiousguy Nov 16 '18 at 23:15

3 Answers3

31

The standard is not always as coherent as we would like, since it is a very large document, written (in practice) by a number of different people, and despite all of the proof-reading that does occur, inconsistencies slip through. In the case of undefined behavior (and errors in general), I think there is an additional problem in that for much of the most basic things (pointers, etc.), the C++ standard inspires from C. But the C standard takes the point of view that all errors are undefined behavior, unless stated otherwise, where as the C++ standard tries to take the point of view that all errors require a diagnostic, unless stated otherwise. (Although they still have to allow for the case where the standard omits to specify a behavior.) I think this accounts for a lot of the inconsistency in the wording.

Globally, the inconsistency is regrettable, but on the whole, if the standard says that something is erroneous, or ill-formed, then it requires a diagnostic, unless the standard says that it doesn't, or that it is undefined behavior. In something like "ill-formed; no diagnostic required", the "no diagnostic required" is important, because otherwise, it would require a diagnostic. As for the difference between "ill-formed; no diagnostic required" and "undefined behavior", there isn't any. The first is probably more frequent in cases where the code is incorrect, the second where it is a run-time issue, but it's not systematic. (The specification of the one definition rule—clearly a compile time issue—ends with "then the behavior is undefined".)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 6
    There is a difference. "Ill formed" means that the *program* is badly written, "undefined behaviour" means that at *runtime* something strange might happen. – vonbrand Mar 04 '14 at 19:45
  • 16
    @vonbrand Not in the C++ standard. There's nothing that restricts "undefined behavior" to runtime. I know: the word "behavior" suggests runtime. But the standard uses its own definitions, and with in the context of its definitions, the only difference is that "ill-formed" may require a diagnostic -- in fact, it does require a diagnostic unless otherwise stated -- and undefined behavior doesn't require one. – James Kanze Mar 05 '14 at 09:03
  • Surely 'ill-formed' means the compiler should or could detect it, and the behaviour should always be 'defined' (by the implementation, if not by the standard). Whereas 'undefined behaviour' means the compiler cannot (or is not expected to) detect it, and the behaviour becomes just 'whatever happens'. They seem a long way apart to me. – david.pfx Mar 07 '14 at 14:34
  • 2
    @david.pfx "Surely x means y" is not something you can say when dealing with the standard (or any standard). The standard very specifically defines the terms it uses. In particular, there are explicitly cases of "ill-formed, no diagnostic required", which are undefined behavior. And violations of the one definition rule, including a missing definition, are undefined behavior, even though most implementations detect at least some instances of it at translation (in this case, link) time. – James Kanze Mar 07 '14 at 15:19
  • 3
    @JamesKanze: This is my __reading__ of the standard, albeit in somewhat informal language. N3337 1.3.25 says UB is only for a well-formed program, and 1.3.26 says that violation of ODR is not well-formed. 1.4 distinguishes NDR from UB. I can find no examples of "ill-formed, no diagnostic required" which are explicitly UB. Despite your confidence in the definitions I see the standard maintaining a clear distinction between NDR and UB, and yet failing to define that difference. – david.pfx Mar 08 '14 at 07:48
  • 2
    The only significant difference I can see between a construct that produces Undefined Behavior versus one that is ill-formed is that a standards-conforming compiler may as default behavior assign and document any useful meaning it sees fit for a construct that represents undefined behavior, but a standards-conforming compiler may not cleanly compile a construct which the standard states is ill-formed (it may allow the program to run, but when using default settings it must issue a diagnostic). Perhaps "ill-formed + NDR" means compilers aren't allowed to supposed to "document" the constructs... – supercat Mar 13 '14 at 16:26
  • 1
    ...as having defined usable behavior, but is free to actually interpret them however it wishes so long as it makes no promises regarding its interpretation? – supercat Mar 13 '14 at 16:27
  • @supercat A compiler is always allowed to document any undefined behavior, making it defined for that implementation. Some cases of undefined behavior are in fact intended to be defined for any given implementation; it's just that the variation in what one would expect to be defined is too large to be documented in the standard. For the rest, "ill-formed, no diagnostic required" is defined by the standard as one of the cases of undefined behavior. – James Kanze Mar 15 '14 at 14:14
  • About UB referring to "runtime", one should in the mean time add that mainstream compilers have begun to aggressively optimize around UB, dead-stripping code on the assumption "cannot be reached because it would be UB". Which is very much "**not** runtime". [Article](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html) (C, not C++, but still... the nullpointer dereference examples are rather silly because the way they're written the code _really_ can't be reached, but e.g. the signed overflow example is outright stunning!) – Damon Jun 24 '14 at 18:11
  • @Damon This doesn't seem very relevant, except that it may explain some of the stranger manifestations of undefined behavior. It's also not at all new. Ever since the aliasing rules existed, in a function like `f( double* a, int* b )`, compilers have assumed that `a` and `b` cannot alias. (I seem to remember encountering this behavior in the mid-1980's.) – James Kanze Jun 25 '14 at 08:47
  • 1
    @JamesKanze: Indeed compilers are allowed to document undefined behavior. I would interpret a statement that a construct is ill-formed as an indication that compilers *should not extend the language to accept it*, with or without a diagnostic. The specification says nothing about what compilers should do when they "reject" a program, beyond the fact that some diagnostic is required; some compilation systems will leave any pre-existing object code as-was, some will replace the object file with one that will deliberately fail to link link,... – supercat Dec 12 '14 at 18:25
  • ...some could perhaps output an object file that contains the parts of the code that had compiled up to the point of the error and which may or may not link, etc. I would interpret "ill-formed; NDR" as basically saying that no such programs should be considered "legal", but acknowledging that some systems, for reasons outside the compiler's control, may be incapable of detecting and flagging all of them. – supercat Dec 12 '14 at 18:31
  • 1
    @supercat As you say, if the code is ill-formed, the compiler is required to emit a diagnostic; what happens after that is not defined, and the implementation can define it to do what it wants. (In at least one case of undefined behavior which doesn't require a diagnostic, g++ emits a warning, and intentionally generates code which will cause the program to crash if the statement is actually executed.) – James Kanze Dec 15 '14 at 11:25
  • @JamesKanze: In some contexts, such as inconsistent "no-return" directives in different compilation units, I would think the logical semantics should be that a compiler may reject a program entirely (with a diagnostic) but if there are no diagnostics and no implementations of the functions actually return, the directives should have no effect on execution (if the directives are used consistently, they have no behavioral consequence if the function never returns; I see no reason that shouldn't remain true whether they're used consistently or not, though I could see a basis for... – supercat Jul 27 '16 at 23:55
  • ...a platform rejecting a program on the basis of inconsistent definitions on the basis that such behavior would be safer than running a program which would likely try to return from a no-return function. – supercat Jul 27 '16 at 23:56
  • @Damon no, the null-pointer dereference example in that article you linked is *not* silly, because the code *can* be reached - first there is a dereference of the pointer, and then there is a check for whether the pointer is null. There is no dead code there. Are you assuming that the deference would segfault if the pointer is null? Because that assumption is wrong when writing a kernel or otherwise not wrapped in the cushy safety of an MMU configured to segfault accesses to address 0. – mtraceur Feb 22 '21 at 05:20
12

The way it should be is: things that are undefined don't cause problems as long as a particular run of a program doesn't trigger the undefined behavior. E.g. a null pointer dereference only ruins your day when your particular program run (characterized by its input: I/O, non-deterministic functions like clock queries, etc.) would actually execute it - but it reaches backwards, so it could exhibit undefined behavior even before technically reaching the dereference. (This is mainly there to allow code rearrangements I think.)

Whereas ill-formed NDR is something that the implementation should diagnose during translation, but may not be able to due to various technical or theoretical limitations. E.g. the ODR would require the implementation to collect all definitions of an entity and compare them; but that's a massive resource drain. Some NDR things are even computationally infeasible. Undefined behavior arises when the implementation doesn't immediately diagnose this stuff.

In practice, undefined behavior applies to some weird cases that aren't runtime conditions. Some weird preprocessor issues trigger undefined behavior. These are weird because they don't have a meaningful representation in the compiled program, so it's unclear what would cause them to execute.

Nevertheless, this view still gives you a reasonable idea for why there are two terms.

Sebastian Redl
  • 69,373
  • 8
  • 123
  • 157
  • With regards to your first sentence: why? And where in the standard is there any indication of this? The definition of __undefined behavior__ in §1.3.24 definitely speaks of "translation or execution" (albeit in a non-normative comment). And some of the most frequent undefined behavoir are due to violations of the one definition rule, which has nothing to do with run-time. – James Kanze Mar 04 '14 at 19:01
  • Sounds reasonable, but... the particular example of a `constexpr` being unable to produce a constant expression from its parameters is a definite non-runtime case, and labeled "ill-formed NDR", which frankly, is impossible (or I'm too stupid to understand it?)... there is no way of continuing the build if you are lacking a constant expression where you need one (see where my confusion comes from :-)). What would you instantiate a template with or initialize an enum to when the parameter is not constant? The compiler simply can't _not_ diagnose that. – Damon Mar 04 '14 at 19:38
  • Would it be legal for a compiler to, as an extension, allow the difference in the addresses of two external symbols to be used as though it were a compile-time constant extension subject to some restrictions [such usage is legal and common in many assemblers]? If so, then it would be possible to have an expression which would not be detectable as causing UB until link time. – supercat Jun 08 '14 at 15:21
2

cppreference now seems to have a very useful summary:

"No diagnostic required" indicates that some phraseology is ill-formed according to the language rules, but a compiler need not issue any diagnostic or error message. Usually, the reason is that trying to detect these situations would result in prohibitively long compile times.

If such a program is executed, the behavior is undefined.

Where anything ill-formed that isn't no diagnostic required must produce compile errors.

Elliott
  • 2,603
  • 2
  • 18
  • 35