4

When I answered this question, I wrote:

First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined.

But there was disagreement in a comment, so I want to ask the question here:

If the source code contains Undefined Behaviour, is it only the behaviour of the translated machine code that is undefined, or is the behaviour of the compiler undefined, too?

The standard defines the behaviour of an abstract machine (1.9):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

Maybe the question is if the compiler is a part of that machine, and if yes, if that part is allowed to behave in an undefined way?


A more practical version of this question would be:
Assume a compiler would crash or not produce any output when it finds UB on all control paths, like in this program:

int main() {
    complex_things_without_UB();
    int x = 42;
    x = x++;  //UB here
    return x;
}

but otherwise it would always produce correct binaries. Would this still be a standard-compliant compiler?

Community
  • 1
  • 1
alain
  • 11,939
  • 2
  • 31
  • 51
  • 3
    The standard doesn't really have a concept of "compiler", AFAIK. – Oliver Charlesworth Aug 22 '15 at 10:11
  • 1
    Does the latter not imply the first? If the compiler has undefined behavior, the resulting machine code (and therefore the user program) is undefined as well, right? – JorenHeit Aug 22 '15 at 10:13
  • @OliverCharlesworth I agree, but what does that mean? – alain Aug 22 '15 at 10:13
  • @OliverCharlesworth "compiler" comes up 6x in the C11 spec.but only as a referenced concept, not definition. – chux - Reinstate Monica Aug 22 '15 at 13:40
  • @JorenHeit I tend to agree. Any requirement on behavior of the program is transitively a requirement of behavior on the compiler. Hence "undefined behavior" at execution time means that there are no requirements on how the implementation must handle the particular situation - in this respect, the behavior of the compiler is also undefined. – davmac Aug 23 '15 at 10:14
  • 2
    Is this a C or a C++ question? Please pick one. – fuz Aug 25 '15 at 09:47
  • @FUZxxl I know they are different languages, but since the sections about UB are very similar, I was thinking the answers apply to both. (The linked question, and the one it is a dupe of, have both tags too, btw.) Else please keep C++ and remove C, if I could not convince you. – alain Aug 25 '15 at 14:05
  • @alain No! Undefined behaviour is very specific to C and there are subtle differences between C and C++. An answer that tries to apply to both is most likely wrong which is a large part of the reason why you should not ask questions for both C and C++. – fuz Aug 25 '15 at 14:12
  • While **internal compiler errors** do exist, compiler writer would usually try to make their compiler robust enough to handle almost any content. Nobody would like a compiler that would crash without any information simply because he wrote an expression that modify the same variable twice. – Phil1970 Jan 04 '20 at 15:50

7 Answers7

6

The C++ standard defines behavior for code, it doesn't define behavior for the compiler. As such, it doesn't really make sense to refer to undefined behavior of the compiler -- it was never well-defined to begin with. The only requirement is that it produces an implementation that conforms to the standard guidelines for the code. How it does this is an implementation detail.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • 1
    but it does impose requirements on the implementation, and the behavior for code is (part) of those requirements. – davmac Aug 22 '15 at 15:34
3

That's a pretty blurry line as a whole. The point is that the source code does not have a defined behaviour, which means the behaviour of the generated code is not well defined.

The compiler should, by all accounts behave in some defined way - but of course, that could be rather "random" (e.g. the compiler may choose to insert a random number into your calculation - or even a call to rand - and it's still perfectly within the rights of the compiler). There are certainly cases where the compiler (ab)uses the fact that it knows something is undefined to make optimisations.

I would consider it a very poor implementation of the compiler if, for example, the compiler crashes or causes the hard-disk to be formatted, but I believe the compiler may be still "right" if it says "This is undefined, I refuse to compile it" [in some manner].

Of course, there are (quite a lot of) situations where something is undefined, not because the construct itself is undefined, but because it's "hard to define a single behaviour that is possible to implement in many places" - for example, using an invalid pointer (int* p = (int*) rand(); or use-after-free) is undefined, but the compiler may not know and understand if it's correct or not. Instead, it's up to the processor architecture what happens if you use a pointer at a random address, or after it has been freed. Both cases may result in a crash on one machine, not a crash, but an erroneous result on another, and in some cases "you won't notice that anything is wrong". This is clearly not the compiler's behaviour that is undefined, but the resulting program.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
2

is it only the behaviour of the translated machine code that is undefined, or is the behaviour of the compiler undefined, too?

The ISO C and C++ describe what a C and C++ program look like. They do not describe the environment they run in. We generally use the term compiler to refer to the tool that translates C and C++ into machine code; formally, however, the term used is implementation which is definitely wider.

Therefore, the only behavior which is undefined is the one of the program. This is also given by the definition of UB:

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

edmz
  • 8,220
  • 2
  • 26
  • 45
  • Your quote does not support your argument. _Use_ of a "nonportable or erroneous program" is not the same as _execution_ of said program. – davmac Aug 22 '15 at 15:26
  • @davmac If I understand what you mean, "program" is not a noun, but an adjective. – edmz Aug 22 '15 at 15:43
  • I suppose I have muddied the issue by restricting my quote to "program" rather than the full "program construct" as it appears in the text. My point is that "use of a program construct" could just as easily refer to the presence of that construct within source code when it is compiled as it could to execution of such a program. Yes, emphasising the word "program" alone as you have done is not correct, I think. – davmac Aug 22 '15 at 15:47
  • @davmac I emphasized such word because, to show my point, if implementations behavior was undefined too, it would be "[...] program or implementation construct" which is self-contradicting for the initially mentioned reasons. Further, the erroneous use appears in the source code and may or may not appear in the generated code. – edmz Aug 22 '15 at 16:07
2

If a code has undefined behaviour it means that the standards does not know how to handle such thing. Thus it can give any output. I think it doesn't have to do with compiler as it doesn't make sense. It makes sense that it has to be the implementation that works according to standards.

So, if standards don't know how to handle such code, then how can compilers give a defined output?

edmz
  • 8,220
  • 2
  • 26
  • 45
ameyCU
  • 16,489
  • 2
  • 26
  • 41
1

Assuming that "undefined behaviour for a compiler" means "there are no requirements on the behaviour of the executable program produced" then the behaviour of the compiler is undefined when presented with source code containing undefined behaviour constructs.

Compare this with the behaviour of the compiler with correct source code. All compilers adhering to the standard must produce executable code with equivalent behaviour, the one defined by the standard for the correct source code.

Anonymous Coward
  • 3,140
  • 22
  • 39
  • If a program contains any `#pragma` directives, it would be legitimate, from the point of view of the Standard, for the mere act of compiling it to invoke nasal demons. – supercat Aug 24 '15 at 16:33
1

My own take is that the behavior in "undefined behavior" is that of the implementation. The spec refers to a process of "translation" that we might equate with compilation, but the fact that you can compile a program to executable code is not relevant here, the result is still considered to be part of the implementation, at least in as far as behavior is concerned. Note that while the spec does define how a C program will behave, when it places requirements these are on the implementation, and the behavior of a program can also be considered a requirement (or set of requirements) on the implementation.

In any case, undefined behaviour can certainly refer to behavior of the compiler. See the note in C11 3.4.3:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

"Terminating translation" clearly refers to a compilation failure whereas "terminating a ... execution" clearly refers to behavior of a running program.

See also Appendix J.2 which lists examples of undefined behavior. Amongst the examples are:

A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment (5.1.1.2)

It seems ridiculous that this should cause undefined behavior at execution time rather than at translation time. There are various other similar examples. The entire set clearly shows cases where undefined behaviour can occur at both compile time and run time.

davmac
  • 20,150
  • 1
  • 40
  • 68
  • The first paragraph you cited is exactly the one that makes me think UB refers to the compiler too. – alain Aug 22 '15 at 16:02
  • Some compilers, if given a `#include` file whose last line is not properly terminated, will concatenate the line following the `#include` directive and process the result as a single line. It's possible some code takes advantage of that, and the authors of the Standard didn't want to break any such code. Rather than try to list all the ways compilers might interpret such things (e.g. does end-of-file count as whitespace, what should be the expansion of a __FILE__ or __LINE__ macro at the very end of a file, etc.) the authors of the Standard simply let compiler writers supply their own rules. – supercat Aug 22 '15 at 19:59
0

There is no compiler mentioned in the standard and implementation details are up to the vendors.

The standard defines how code should behave (in a syntactical and semantical way) and/or be constrained in complexity terms regarding some standard library algorithms. The source code doesn't have to have a precise behavior (nor this is defined anywhere). Every compiler just has to produce code that, under the as-if rule, is correct.

It doesn't make sense to refer to undefined behavior of the compiler

GaryO
  • 26
  • 1
  • Note concerning "no compiler mentioned in the standard ", "compiler" comes up 6x in the C11 spec. – chux - Reinstate Monica Aug 22 '15 at 13:36
  • @chux I believe the term "compiler" is used only in non-normative parts of the spec (footnotes and such). However, the term "translation" which is used more formally probably could be equated with compilation. – davmac Aug 22 '15 at 15:43