5

Under modern interpretations of "Undefined Behavior", a compiler is entitled to assume that no chain of events which would cause undefined behavior to be "inevitable" will occur, and can eliminate code which would only be applicable in cases where code is going to perform Undefined Behavior; this may cause the effects of Undefined Behavior to work backwards in time and nullify behaviors that would otherwise have been observable. On the other hand, in cases where Undefined Behavior would be inevitable unless a program terminates, but where a program could and does terminate prior to invoking Undefined Behavior, behavior of the program would remain fully defined.

In making this determination, what causes of termination is a compiler required to consider? As a couple of examples:

On many platforms, a call to a function like "getc" will normally return (at least eventually), but under some cases outside the control of the compiler will not. If one had a program like:

int main(int argc, char *argv[])
{
  if (argc != 3)
  {
    printf("Foo\n");
    return 0;
  }
  else
  {
    int ch;
    printf("You'd better type control-C!\n");
    int ch = getc();
    if (ch < 65)
      return (ch-33) / (argc-3);       
    else
      return INT_MAX + ch;
  }
}

would behavior be defined in case program was called with argc equal to three, but a SIGINT prevented the getc() call from returning at all? Certainly if there were any value that getc() could return which would result in defined behavior, no Undefined Behavior could occur until the compiler could be certain that such input would not be received. In the event that there is no value getc() could return which would avoid Undefined Behavior, however, would overall program remain defined if getc() was prevented from ever returning any value? Would the existence of a causal relationship between the return value of getc() and the actions invoking Undefined Behavior affect things (in the example above, a compiler could not know that any particular form of Undefined Behavior would occur without knowing what character was input, but any possible input would trigger some form).

Likewise, if on a platform there existed addresses which, if read, were specified to cause a program to immediately terminate, a compiler's specified that volatile reads will trigger hardware read requests, and some external library on that platform specified that it would return a pointer to such an address, would those factors imply that the behavior of bar in this separate example:

int foo(int x)
{
  char volatile * p = get_instant_quit_address();
  if (x)
    { printf("Hey"); fflush(stdout); }
  return *p / x; // Will cause UB if *p yields a value and x is zero
}
int bar(void)
{
  return foo(0);
}

would be defined (as terminating without having printed anything) if attempting to read *p would in fact immediate terminate program execution without yielding a value? The division cannot proceed until a value is returned; thus, if no value is returned, there would be no divide by zero.

By what means is a C compiler allowed to determine whether a given action might cause program execution to terminate in ways that it doesn't know about, and in what cases is it allowed to reschedule Undefined Behavior ahead of such actions?

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 2
    Sounds like you want us to solve the [Halting Problem](http://en.wikipedia.org/wiki/Halting_problem) – Barmar Apr 24 '15 at 19:16
  • I don't understand the point of the question. If something is outside the control of the compiler, what do you expect it to do about it? – Barmar Apr 24 '15 at 19:18
  • You can never assume that a function will finish -- someone can always pull the plug on the computer. – Barmar Apr 24 '15 at 19:18
  • I don't understand the question. Regardless of the number of arguments the code does not consider them. – Weather Vane Apr 24 '15 at 19:21
  • @WeatherVane Number of arguments matters as it determines which branch the program takes (1st snippet). – P.P Apr 24 '15 at 19:31
  • @Barmar: Hyper-modern compilers will assume that any sequence of events which will inevitably produce Undefined Behavior will not occur. My question is whether the compiler would be forbidden from considering Undefined Behavior to be "inevitable" when there are contingencies which aren't known to the compiler, but are defined by the platform where code is running, which could prevent the UB from occurring. – supercat Apr 24 '15 at 19:33
  • 5
    This question ultimately reduces to *"can a C compiler determine whether a given action will terminate a program?"* which is the halting problem. But then the question goes on, *"in ways that it doesn't know about?"* to which the answer is, "Duh, no." – user3386109 Apr 24 '15 at 19:36
  • Are you asking whether the compiler can assume that the loop in the first snippet will never complete, because it assumes you'll never divide by zero? – Barmar Apr 24 '15 at 19:42
  • @Barmar: I was asking whether it could optimize out the `if` statement and unconditionally print "foo" on the basis that the only way the second branch would fail to produce Undefined Behavior would be if one of the `printf` calls caused the program to exit without returning. – supercat Apr 24 '15 at 19:44
  • 1
    @user3386109: The question is fundamentally whether a compiler can use an assumption that an action won't terminate a program as a basis for declaring the execution of the program Undefined, when the action would in fact terminate the program at a point which would be sequenced before Undefined Behavior could be invoked. – supercat Apr 24 '15 at 19:47
  • This seems like a really theoretical question, not very practical. The only reason that's even a consideration is because the division by zero is detectable at compiler time, which it normally wouldn't be. – Barmar Apr 24 '15 at 19:47
  • @BlueMoon my point was that the arguments `*argv[]` have nothing to do with it. Red herring. – Weather Vane Apr 24 '15 at 19:48
  • @WeatherVane `argv` is not used but what OP means by "arguments" is the parameters passed to the program and that determines the value of `argc` and that in turn, determines whether the program invokes UB. – P.P Apr 24 '15 at 19:51
  • @Barmar: Hyper-modern optimizers are becoming increasingly aggressive at deciding that situations will inevitably result in UB. Thus, if code like something like `if (should_launch_missiles()) launch_missiles();` is followed by "inevitable" Undefined Behavior, some compilers may make the call to `launch_missiles()` unconditional if it might terminate, since program behavior would be defined in the case where `should_launch_missiles()` returned 1 and `launch_missiles()` terminated, but would be undefined if `should_launch_missiles()` returns zero. – supercat Apr 24 '15 at 19:53
  • @WeatherVane: The method `foo` was meant to be a complete separate example, related to a situation of particular interest in embedded contexts (where it is in fact common to have addresses which can be used to trigger an immediate reboot). If I have an `if` statement which is supposed to trigger an immediate reboot if certain variables are wrong in a way that would trigger UB, I don't want the compiler to say "Hey, that `if` statement won't be relevant unless execution is undefined, so it can simply be omitted". – supercat Apr 24 '15 at 19:59
  • @WeatherVane I don't think the two snippets are connected to each other(OP can clarify). As I understand, they are two different examples and OP asks about their behaviour under specified circumstances. – P.P Apr 24 '15 at 20:00
  • this line: 'return printf("Complete")/0; ' will not compile , suggest deleting the trailing '/0' – user3629249 Apr 24 '15 at 20:00
  • I really don't understand why a divide by 0 in the source code is expected to be handled in execution. It was me who down voted. I will always cop to it ... uh-oh, you edited the question. – Weather Vane Apr 24 '15 at 20:01
  • The edit makes your intent clear. However, I submit that what you're now asking is primarily opinion based. You are asking for a debate on whether the so-called "modern" interpretation of undefined behavior is valid, and more broadly whether the excessive use of the term undefined behavior in the C specification has undermined the usefulness of the specification. – user3386109 Apr 24 '15 at 20:04
  • at execution time, not compile time, the second example would cause a seg fault event. Depending on evaluation it could crash because x is 0 resulting in a 'divide by 0' fault or it could crash because 'p' points to write-only memory. However, I have never come across a program crashing because of write only memory (like certain peripheral register) as usually a read action just returns 0. – user3629249 Apr 24 '15 at 20:05
  • @user3386109: There are many platforms where events are triggered by reads. Write-based triggering is more common, but I used a read in this case because on many platforms the processor may begin processing the instruction after a store before the write cycle has even started, and I didn't want to muddy the water with such issues. – supercat Apr 24 '15 at 20:10
  • the first snippet would fail to compile cleanly because of the compiler raising a warning about 'argv[]' not being used. However, if a user went ahead and ran the program, it would (irregardless of the number of arguments) run to completion. Note a million lines of text output to the terminal would be highly undesirable, but would not stop the program from running – user3629249 Apr 24 '15 at 20:11
  • This is a "divide by zero" question dressed up to be high-falluting. The args are a complete irrelevance. If you don't have the wit to trap div 0 yourself, and the compiler hasn't seen it, you are at the mercy of the OS. What the devil has `volatile` in the question to do with it? – Weather Vane Apr 24 '15 at 20:12
  • So you have just edited the question to find a different way of dividing by 0. – Weather Vane Apr 24 '15 at 20:19
  • 1
    @WeatherVane: I'm not trying to "let the compiler do the work for me"--to the contrary, I'm trying to understand what is necessary to ensure that a compiler would be forbidden by the C standard from doing things I *DON'T* want it to do. – supercat Apr 24 '15 at 20:19
  • Answer: never be so lazy as to let the compiler do the work for you. – Weather Vane Apr 24 '15 at 20:21
  • In the 1st snippet, if `argc<=2` then it'll never execute the else clause, so a compiler can't assume that the program will invoke undefined behaviour and therefore, can't reject the program. if `argc>2` then compiler can never know that the program will receive a SIGPIPE before eventually executing erroneous line. But this is UB as compiler can assume that entire `else` clause *erroneous* as C standard defines the behaviour based on an *abstract machine*. On an abstract machine, there's no way to know to consider any external event (a signal in this case) would terminate the program [cont.] – P.P Apr 24 '15 at 20:26
  • [cont] while the standard does make allowances for external events (such as volatile qualified objects), a signal guaranteed by only the platform is not something the C standard considers. Under the same reasoning, I'd say the 2nd snippet also invokes UB. – P.P Apr 24 '15 at 20:26
  • @BlueMoon: The compiler can't know that the program *will* receive a SIGPIPE; the question is whether a compiler for a platform like Unix where file I/O could trigger such signals would be required to consider program execution defined if such a thing did occur? Is there a rule that specifies that all standard library methods other than `exit` and `longjmp` may be presumed to return? What about something like `fgets` from `stdin`, with result ignored, but with a someone at the console who will hit control-C rather than Enter if things look like they're getting wacky? – supercat Apr 24 '15 at 20:33
  • @BlueMoon: For the embedded reset scenario, I would guess that enclosing the code to perform the reset in a `while(1)` loop is probably the safest bet (endless loops with side-effects are permissible under the standard) but I was uncertain if it was necessary. I suppose regardless of what the standard says it probably wouldn't be a bad idea in any case. – supercat Apr 24 '15 at 20:36
  • @user3629249: There seems to be substantial contention about whether the answer is "Duh, no" or "Yeah, whatever makes the SPEC benchmarks run faster." You can probably get someone who works on Clang to tell you that every program that contains a division by 0 in dead code is actually invalid C. – tmyklebu Aug 26 '15 at 04:44
  • @tmyklebu: Someone needs to start specifying some benchmarks as measuring the performance of performing certain tasks, with code written in the fastest way that will work on the compiler. I suspect I could offer up some tasks for which clang wouldn't come out looking so good, as code written for clang would have to spend as much time preventing UB as doing useful work. – supercat Aug 26 '15 at 13:26
  • @supercat: I'd prefer if compiler writers would stop just making shit up on the fly to justify their bugs. GCC has been relatively good about this. – tmyklebu Aug 26 '15 at 14:11
  • @tmyklebu: What I find most ironic about the recent evolution of the language is that there doesn't seem to be any focus on adding features that would improve optimization without compromising robustness. For example, I'd like to see a __CHECKED_ASSUME(x) directive which would entitle, but not require, a compiler to raise a non-returnable implementation-defined trap or signal as soon as a compiler could determine that the directive would be reached without `x` yielding a defined "true" value, and __EARLY_ASSERT(x,msg) which would act like an assert, but allow a compiler to act early on it. – supercat Aug 26 '15 at 15:36
  • @tmyklebu: A compiler would be free to ignore a __CHECKED_ASSUME(x) directive if it determined that the cost of checking the assumption at any point would exceed the optimization that could be achieved therefrom (or, for that matter, if it simply didn't know how to do anything useful with it), but in many cases being able to hoist a check earlier in the code (and being able to remove dead code paths *following the check*) could allow considerable efficiency improvements while if anything improving robustness. – supercat Aug 26 '15 at 15:38

1 Answers1

3

This is well described in C++ under [intro.execution]:

5 - A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

It is generally accepted that C has the same characteristics, and so that a C compiler can similarly perform "time travel" in the presence of undefined behavior.

Importantly, note that the question is whether there exists an instance of the abstract machine exhibiting undefined behavior; it doesn't matter that you can arrange to prevent undefined behavior on your machine by terminating program execution first.

You can prevent undefined behavior (and resulting time travel) if you cause the program to terminate itself in a fully-defined way which the abstract machine cannot wriggle out of. For example, in your second example if you replace access to *p with (exit(0), 0) then undefined behavior cannot occur as there is no possible execution of the abstract machine where exit returns to its caller. But whatever the characteristics of your platform, the abstract machine does not have to terminate your program on access to an insta-kill address (indeed, the abstract machine does not have any insta-kill addresses).

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • For the embedded C scenario on a platform with a write-triggered reset, would `while(1) FORCE_RESET = 1;` be wriggle-proof if `FORCE_RESET` were declared as `volatile`? I think endless-loops are not UB if they contain side-effects, and writing a volatile should qualify. Otherwise, if one wants to e.g. have a method which will output data and terminate program execution if it would cause a SIGPIPE, is the only way to do that to replace all calls to standard-library I/O with calls to functions the compiler can't know anything about which are then forward to the standard library functions? – supercat Apr 24 '15 at 20:46
  • @supercat yes, I think that should work; even a post-classical compiler isn't entitled to construct an execution containing an infinite number of operations followed by an undefined operation. I'd still be inclined to put in an `abort()` after the while loop. With regard to replacing calls, you might still be in danger if the compiler is able to look through your replacement calls via link-time optimization. – ecatmur Apr 28 '15 at 10:16
  • 1
    I don't like `abort()` in embedded systems, since it's unclear what it's supposed to mean; for non-hosted platforms I don't think the C standard imposes any requirements on `abort()`, meaning calling it would be Undefined Behavior. Otherwise, I don't know if it's worth editing the original question, but I wonder whether a compiler for a machine where CHAR_BITS==8 could consider `1/((getc(stdin)+1)>>9);` as invoking UB before an incoming character arrives? Any value that `getc()` could return would cause UB, but is it mandated that getc() will return at all? – supercat Apr 28 '15 at 15:51
  • @supercat you're correct, `abort` is in ``, which isn't available to a strictly conforming freestanding program (4. Conformance/6). Again, with `getc` it doesn't matter whether it will return; the fact that it *might* is enough to invalidate the entire execution. – ecatmur Apr 28 '15 at 16:02
  • How would the fact that `getc` "might" return invalidate the program if, in fact, it doesn't? If the code had been `1/(getc()-33);`, then the code would invoke UB if someone entered a `!`, but the fact that someone "might" enter UB would not invoke UB unless or until someone actually *did*. BTW, I was thinking of asking another question about aliasing but want to make sure I'm not missing something: if code passes a pointer to a local variable to an external method, must it assume forevermore that any external method call might modify that variable? Would a `restrict` qualifier waive that? – supercat Apr 28 '15 at 16:13
  • @supercat now I'm not sure; C++ [intro.execution]/5 conflicts with /8 here - for example, `int main() { puts("hi"); getc(stdin); return 1/0; }`. It might be worth asking a question specifically about that; I'd certainly be interested in the answer. WRT your second question I'm not sure how `restrict` would help; that's more a guarantee that arguments will not alias each other, allowing reads and writes via arguments to be optimized. – ecatmur Apr 28 '15 at 17:07
  • I didn't know whether `restrict` is only meaningful when there are two or more such arguments and only means they can't alias each other, or if it was meant to imply they can't alias anything else either. One thing I've often thought might be helpful would be a construct which would tell the compiler "Within this block of code, feel free to copy X to a register or not as you see fit; if it's written in one place and read in another, I'll be happy whether or not the write affects the read." Another variant on that construct would say, within a loop, "I need X to be reloaded from memory... – supercat Apr 28 '15 at 17:17
  • ...at least occasionally, but nowhere near every pass through the loop". On some multi-core systems, reading a `volatile` variable on every pass through a loop to see if a user has requested that it exit could be expensive; if a programmer could say that a read is required at least once every 100 passes, a compiler that was going to unroll the loop 16 times could either use a counter to poll the variable every 6 times, or simply poll it once every time through the main loop, whichever seemed likely to be faster. – supercat Apr 28 '15 at 17:19
  • 2
    IMHO, the real key to making optimization useful is to provide ways by which programmers can specify semantics which are somewhat loose but still short of "anything goes". My objection to present UB-based philosophy is that in many cases it forces programmers to add code which formerly wasn't necessary, to specify semantics that are much tighter than what they actually need, purely to ensure that the compiler won't use UB as an excuse to throw causality out the window. BTW, a difference between your example and mine is that in mine the division would be sequenced after the return from getc... – supercat Apr 28 '15 at 17:25
  • ...whereas with yours, if `getc` were considered "guaranteed to return", the division could be moved earlier. – supercat Apr 28 '15 at 17:26
  • Does anything document whether `__builtin_unreachable` can time-travel, and if it can, is there an equivalent that can't (or should that be another question)? If writing to a `volatile` variable will, unbeknownst to the compiler, the semantically-optimal behavior would be something the flow logic would regard as an unconditional exit(0) but without the compiler actually generating any code. – supercat May 18 '15 at 17:37
  • @supercat according to https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00188.html `__builtin_unreachable` should "retain side-effects" i.e. no time travel. – ecatmur May 18 '15 at 18:44
  • That's a discussion suggesting that someone thinks side-effects *should* be maintained; is there anything that would "officially" document that, or make clear the effect of `void moo(int x) { if (x!=0) foo(); else { resetTrigger=1; __builtin_unreachable(); } return 1/x;}` Here it would be necessary that not only must `__builtin_unreachable()` not prevent the execution of `resetTrigger`, but it must also make the compiler aware that `return 1/x` would not be reachable when `x==0`, and thus `x` could legitimately equal zero. – supercat May 18 '15 at 19:05