36

If an evaluation of an expression causes undefined behavior in C, and the expression is always evaluated when the program is executed (for example if it appears at the start of main), is it conforming if an implementation rejects it at compile time? Is there a difference in C between compiling/translating a program and executing it?

I know that there are interpreters for C. How are they handled by the C standard regarding this difference?

Example (reading uninitialized local)

int main() {
  int i;
  return i;
}

When running it, at any stage of the execution (even before main is called), the program can do something funny. But can something funny also happen when we haven't even tried to run it? Can it cause a buffer overflow in the compiler itself?

huysentruitw
  • 27,376
  • 9
  • 90
  • 133
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • 1
    As far as I'm concerned, the compiler is allowed to statically reject undefined behaviour at compile time—after all, syntax errors are undefined behaviour. – fuz Jan 01 '16 at 15:27
  • That's why C has compile time and runtime errors. – haccks Jan 01 '16 at 15:27
  • 2
    This question seems somewhat broad and nebulous without an example of such an expression. – Clifford Jan 01 '16 at 15:29
  • @Clifford: Suppose your program includes a constant signed-integer expression that triggers arithmetic overflow. Many compilers pre-evaluate such expressions. Is the compiler allowed to reject the program on this basis? – ruakh Jan 01 '16 at 15:30
  • @Clifford `int main() { scanf("%d", 42); }` – Johannes Schaub - litb Jan 01 '16 at 15:31
  • @ruakh - If it invokes UB, why couldn't the compiler reject it? – Andrew Henle Jan 01 '16 at 15:31
  • 3
    You have responded to a point about the question in a comment. It would be better to use the example to *improve the question*. – Clifford Jan 01 '16 at 15:33
  • @ruakh - I was not looking for an explanation of the question, I was suggesting it be *improved* with an example or two. – Clifford Jan 01 '16 at 15:34
  • @JohannesSchaub-litb Not entirely sure but on embedded systems that might actually be valid code. – Iharob Al Asimi Jan 01 '16 at 15:34
  • 5
    Undefined is, well, undefined. Doesn't that mean that, properly speaking, the standard doesn't even address what happens in those cases? Maybe a rejecting compiler is one form that nasal demons can take on – John Coleman Jan 01 '16 at 15:34
  • @iharob the assumption is that you have included `stdio.h` before and that `scanf` is the lib function. You got a point, so I made a simplier example and added it to the question. – Johannes Schaub - litb Jan 01 '16 at 15:35
  • I'm not sure if the given example is undefined *behavior* so much as an undefined *value*. Isn't the behavior perfectly defined -- return the bit-pattern of size `sizeof(int)` beginning at address `&i`? – John Coleman Jan 01 '16 at 15:41
  • Imagine a compiler constant propagation pass that assumes that a local variable (besides a parameter) always has a known value when it is read. Customers complain about the crashing compiler and the compiler vendor says "undefined behavior". Do they have a point? – Johannes Schaub - litb Jan 01 '16 at 15:43
  • 3
    @FUZxxl: No, syntax errors are explicitly **not** Undefined Behavior. Syntax errors require a diagnostic. Undefined Behavior does **not** require a diagnostic. – MSalters Jan 01 '16 at 21:30
  • @MSalters in C, a program may require a diagnostic *and* contain undefined behavior. So as far as I see, it makes no difference if something requires a diagnostic *and* causes undefined behavior or if it *just* requires a diagnostic without causing undefined behavior (does that even make sense at all!?). Because after emission of a diagnostic, behavior isn't defined anymore. Or am I missing a detail? – Johannes Schaub - litb Jan 01 '16 at 21:46
  • There were even Easter Eggs on UB. GCC 1.17, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi. – Eldar Abusalimov Jan 01 '16 at 22:29
  • @JohannesSchaub-litb: It's true that a compiler may produce an executable after issuing a mandatory diagnostic, and there are no requirements on the outcome then, but that's not formal 3.4.3 Undefined Behavior. – MSalters Jan 01 '16 at 22:31
  • FWIW, it's fairly easy to break a C++ compiler with template metaprogramming. I guess the question is whether you need something that powerful before you can break the compiler with something that isn't a compiler bug. Can C language compilation compute general recursive functions? – Gary Jackson Jan 02 '16 at 06:35
  • @GaryJackson i guess circular #inclusion is similar :) – Johannes Schaub - litb Jan 02 '16 at 11:38

3 Answers3

40

From a C11 draft:

3.4.3 undefined behavior

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Terminating the translation is mentioned as a possible consequence of undefined behavior in the (non-normative) note, so compile-time effects are clearly not intended to be excluded. The normative part certainly allows it - it allows anything. So conforming compiler can terminate the translation if it detects undefined behavior during compilation.

Additionally, in $4 Conformance:

If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.

There is no distinction made either in the normative definition or in the conformance description between "translation time" and "execution time". No difference is made between different "varieties" of undefined behavior.

Additionally, Defect Report #109 pointed out by ouah in Can code that will never be executed invoke undefined behavior? has this in its response:

[...] If an expression whose evaluation would result in undefined behavior appears in a context where a constant expression is required, the containing program is not strictly conforming. Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming.

A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. [...]

This would indicate that a compiler cannot fail a translation if it cannot statically determine that all paths lead to undefined behavior.

Community
  • 1
  • 1
Mat
  • 202,337
  • 40
  • 393
  • 406
  • But there's an unclear point here that makes me worry: There's undefined behavior that affects translation. For example it says about header names: **"If the characters ', \, ", //, or /\* occur in the sequence between the < and > delimiters, the behavior is undefined."**. Does the consequence mentioned in your quoted sentence only refer to this kind of undefined behavior? That's essentially what my doubt is about. – Johannes Schaub - litb Jan 01 '16 at 15:53
  • And the very wording of that note indicates that there might be some difference between translation and execution (at the very least that there are these two different concepts around) because "... to behaving during translation or program execution .." – Johannes Schaub - litb Jan 01 '16 at 15:54
  • It also says, for example, "Translation units may be separately translated and then later linked to produce an executable program.". So, there appears to be a non-deterministic actor assumed that receives an "executable program" and can either leave it alone or execute the program. If it leaves it alone, evaluation doesn't happen, so UB won't happen aswell. Where does the spec rule out such a model? – Johannes Schaub - litb Jan 01 '16 at 15:57
  • As another example, the following program must be accepted and (if executed) execute correctly: `int main(int argc, char **argv) { if(argc < 0 && argc >= 0) { int i; return i; } return 0; }` because the UB-expression is never evaluated. – Johannes Schaub - litb Jan 01 '16 at 16:00
  • Re your first comment: UB is UB. There are no two (or more) classes described. 2nd comment: I disagree, the notes make it explicit that both runtime & compile-time consequences are possible. – Mat Jan 01 '16 at 16:00
  • For your third: I don't understand. If the compiler produced something (whether it contained UB or not - it's allowed to produce code either way), but no-one runs it, nothing happens. Why would the C standard rule something about nothing happening? – Mat Jan 01 '16 at 16:03
  • For your last, that's more complicated. I'd refer you to Keith Thomson's answer here which is most likely more accurate that I could ever be: http://stackoverflow.com/questions/18385020/can-code-that-will-never-be-executed-invoke-undefined-behavior – Mat Jan 01 '16 at 16:06
  • well in the question he says "The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?". The same logic is applied by my reasoning. You said "If the compiler produced something, but no-one runs it, nothing happens". The conclusion, up to this point, coincedes with the starting assumption of that linked question. So does the conclusion in that question apply aswell, and the code in *my* question therefore would not cause undefined behavior if not executed aswell? – Johannes Schaub - litb Jan 01 '16 at 16:09
  • As I read Keith's answer, he says "_But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior._" The code in your question _would_ trigger UB if run, and the compiler can infer that (trivially by modern compiler standards), so it is "rejectable". The code in your comment above would not, so it wouldn't be rejectable. – Mat Jan 01 '16 at 16:14
  • well, "*would* trigger UB if run". But only *if* run. If it is not run, that means there is no UB. Now the question is, does translating C code implies all formal effects (including UB) that the execution implies? In the comment you said *"the notes make it explicit that both runtime & compile-time consequences are possible."*, but that is only natural because there can be UB at compile time (certain characters within a header name) and runtime (division by zero). The note doesn't really make it clear whether a runtime-UB can cause retroactive effects to translation time. – Johannes Schaub - litb Jan 01 '16 at 16:15
  • That's not how I read it no. In the code in your question, all paths have UB. The compiler can reject it. – Mat Jan 01 '16 at 16:16
  • "_does translating C code implies all formal effects (including UB) that the execution implies?_" I'm not following you there. – Mat Jan 01 '16 at 16:20
  • In the post you linked Mat, nobody seems to mention DR#109 where it is written that: *Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming. A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior.* http://www.open-std.org/jtc1/sc22/wg14/docs/rr/dr_109.html – ouah Jan 01 '16 at 16:20
  • @Mat If UB can have retroactive effects, I would think that writing to volatile variables also has can have retroactive effects. Meaning for `int main() { magicRegister = 42; }`, the implementation can, by just translating the program, behave as though a magicRegister has just been written-to the value 42? Because all paths will do the write. – Johannes Schaub - litb Jan 01 '16 at 16:22
  • @ouah: thanks, I'll incorporate that, seems quite clear & authoritative. – Mat Jan 01 '16 at 16:24
  • @ouah ah, thanks. that's good information and directly answers my question! – Johannes Schaub - litb Jan 01 '16 at 16:26
  • One further question though. Why doesn't the described model match those of interpreters? – Johannes Schaub - litb Jan 01 '16 at 16:29
  • @JohannesSchaub-litb: I was thinking about a REPL where you often redefine (or even re-declare) things "on the fly" and that model doesn't seem to match the "workflow" describe in the standard. But why not, if the interpreter manages to follow the "shalls" one way or another, good for it. – Mat Jan 01 '16 at 16:42
14

In the C11 standard, §3.7.1 it is stated, under the definition of the term undefined behavior:

undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

So I guess you are allowed to statically reject a program which contains undefined behavior, even if it's valid.

Community
  • 1
  • 1
Jack
  • 131,802
  • 30
  • 241
  • 343
  • But there's an unclear point here that makes me worry: There's undefined behavior that affects translation. For example it says about header names: **"If the characters ', \, ", //, or /\* occur in the sequence between the < and > delimiters, the behavior is undefined."**. Does the consequence mentioned in your quoted sentence only refer to this kind of undefined behavior? That's essentially what my doubt is about. – Johannes Schaub - litb Jan 01 '16 at 15:47
  • And the very wording of that note indicates that there might be some difference between translation and execution (at the very least that there are these two different concepts around) because "... to behaving during translation or program execution .." – Johannes Schaub - litb Jan 01 '16 at 15:52
  • A difference between translation and execution (which is present and clearly stated in the execution model) doesn't imply a difference in the meaning of undefined behavior. In any case if you are allowed to reject a program during translation which contains undefined behavior at runtime I don't see any reason not to be able to do it for undefined behavior which occurs directly during translation, it would be a counter-restriction. – Jack Jan 01 '16 at 16:00
5

is it conforming if an implementation rejects it at compile time?

It may or may not. C standard says about it in section §3.4.3:

C11: 3.4.3 undefined behavior

  1. behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

  2. NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

So answer to your question: Can it cause a buffer overflow in the compiler itself?

Yes it can.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • 9
    The question isn't if a conforming compiler *must* reject such programs but if a conforming standard is *allowed* to reject such programs. – John Coleman Jan 01 '16 at 15:33