41

Can optimizing compiler delete infinite loops, which does not changes any data, like

while(1) 
  /* noop */;

From analyzing a data flow graph compiler can derive, that such loop is "dead code" without any side effects.

Is deleting of infinite loops prohibited by C90/C99 standards?

Does C90 or C99 standards permit compiler to deleting such loops?

Upd: "Microsoft C version 6.0 did essentially this optimization.", see link by caf.

label: goto label;
return 0;

will be transformed to

return 0;
Nietzche-jou
  • 14,415
  • 4
  • 34
  • 45
osgx
  • 90,338
  • 53
  • 357
  • 513
  • 2
    It is busy-waiting, and daemons shall not use busy-waiting a lot. I call such construction "dead" in means of Data Flow. If statement does not change any variables, and does not contain side-effect, it can be eliminated by optimizing compiler. – osgx Feb 01 '10 at 16:17
  • Code after loop is not "unreachable", loop can be interrupted with signal, and there can be "longjmp" in signal handler. – osgx Feb 01 '10 at 16:20
  • 7
    "Optimizing endless loops" == "how to make endless loops finish faster" – Rubens Farias Feb 01 '10 at 16:22
  • 1
    In Embedded Systems, there are often *background loops* which may or may not perform processing while an interrupt service routine performs important processing. So the `while(1)` is not dead code, but a common implementation. – Thomas Matthews Feb 01 '10 at 18:20
  • Updated my answer w.r.t. to C99. – Shafik Yaghmour Feb 24 '15 at 20:57

6 Answers6

41

C11 clarifies the answer to this question, in the draft C11 standard section 6.8.5 Iteration statements added the following paragraph:

An iteration statement whose controlling expression is not a constant expression,156) that performs no input/output operations, does not access volatile objects, and performs no synchronization or atomic operations in its body, controlling expression, or (in the case of a for statement) its expression-3, may be assumed by the implementation to terminate.157)

and footnote 157 says:

This is intended to allow compiler transformations such as removal of empty loops even when termination cannot be proven.

So your specific example:

while(1) 
  /* noop */;

is not fair game for optimization since the controlling expression is a constant expression.

Infinite loops as UB

So why are compilers allowed to optimize away infinite loops with the exception provided above, well Hans Boehm provides a rationale for making infinite loops undefined behavior in: N1528: Why undefined behavior for infinite loops?, the following quote gives a good feeling for the issue involved:

As N1509 correctly points out, the current draft essentially gives undefined behavior to infinite loops in 6.8.5p6. A major issue for doing so is that it allows code to move across a potentially non-terminating loop. For example, assume we have the following loops, where count and count2 are global variables (or have had their address taken), and p is a local variable, whose address has not been taken:

for (p = q; p != 0; p = p -> next) {
    ++count;
}
for (p = q; p != 0; p = p -> next) {
    ++count2;
}

Could these two loops be merged and replaced by the following loop?

for (p = q; p != 0; p = p -> next) {
        ++count;
        ++count2;
}

Without the special dispensation in 6.8.5p6 for infinite loops, this would be disallowed: If the first loop doesn't terminate because q points to a circular list, the original never writes to count2. Thus it could be run in parallel with another thread that accesses or updates count2. This is no longer safe with the transformed version which does access count2 in spite of the infinite loop. Thus the transformation potentially introduces a data race.

In cases like this, it is very unlikely that a compiler would be able to prove loop termination; it would have to understand that q points to an acyclic list, which I believe is beyond the ability of most mainstream compilers, and often impossible without whole program information.

C99

Since C99 does not have this carve out, we would look to the as-if rule covered in section 5.1.2.3 which basically says that the compiler only has to emulate the observable behavior of a program, the requirements are as follows:

The least requirements on a conforming implementation are:

  • At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.
  • At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
  • The input and output dynamics of interactive devices shall take place as specified in 7.19.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

A strict reading of this would seem to allow an implementation to optimize an infinite loop away. We can certainly come up with scenarios where optimizing away an infinite loop would cause a change in observable behavior:

while(1) ;
printf( "hello world\n" ) ;

Many would argue that effecting the termination of a process is also observable behavior, this position is taken in C Compilers Disprove Fermat’s Last Theorem:

The compiler is given considerable freedom in how it implements the C program, but its output must have the same externally visible behavior that the program would have when interpreted by the “C abstract machine” that is described in the standard. Many knowledgeable people (including me) read this as saying that the termination behavior of a program must not be changed. Obviously some compiler writers disagree, or else don’t believe that it matters. The fact that reasonable people disagree on the interpretation would seem to indicate that the C standard is flawed.

Update

I somehow missed the the follow-up to the above article, Compilers and Termination Revisited which says the following about section 5.1.2.3:

The second requirement is the tricky one. If it is talking about termination of the program running on the abstract machine, then it is vacuously met because our program does not terminate. If it is talking about termination of the actual program generated by the compiler, then the C implementation is buggy because the data written into files (stdout is a file) differs from the data written by the abstract machine. (This reading is due to Hans Boehm; I had failed to tease this subtlety out of the standard.)

One could also make a weaker argument that the need to create a carve out in C11 to allow empty loop removal implies this was not an allowable optimization previously.

Does this apply to infinite goto loops as well?

I believe the intent is that this also applies to infinite goto loops as well. In C++ this is clearly the case since section 1.10 [intro.multithread] says:

The implementation may assume that any thread will eventually do one of the following

  • terminate,
  • make a call to a library I/O function,
  • access or modify a volatile object, or
  • perform a synchronization operation or an atomic operation.

and then intent as expressed in N1528 is that the C and C++ standard agree:

Since compiler back-ends are typically shared between C and C++ compilers, it appears most important that WG14 and WG21 agree on whatever solution is adopted. The alternatives would be special treatment of the two languages by the back-end, or disabling optimizations when processing C code. Neither appears at all desirable.

and at the end says:

WG21 is considering improved wording that makes the treatment consistent. Hopefully WG14 will track any resulting changes.

Currently the C11 standard does not contain the similar wording in section 5.1.2.4 Multi-threaded executions and data races but considering N1528 it seems wise to assume compilers will treat infinite goto loops as undefined behavior in C and C++.

Note also see US comment 38 here and N3196 which is the paper that this changed was applied from.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 2
    It seems that the C11 wording implies that `while (1,1) /* no-op */;` *can* be optimised out, since the introduction of a comma operator means that it's no longer a constant expression. – caf Feb 25 '15 at 23:30
  • @caf I agree *Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators* – Shafik Yaghmour Feb 26 '15 at 02:14
  • How about `goto` loops? – Tor Klingberg Dec 15 '15 at 17:18
  • You seem to read 6.8.5 as having time consumption semantics. I don't find any such requirement in that section. I see nothing there requiring the abstract machine to expend infinite time on `while(1);`. Why do you believe the abstract machine is not permitted to complete that loop in finite time? – Eric Towers Jan 30 '20 at 03:09
  • @EricTowers: For any number of seconds N, a program that always waits N+1 seconds and performs some visible action would not be observably different from one that waits forever without doing anything, nor from one that sometimes performs and action immediately and sometimes performs it after N+1 seconds. A program that performs any observable action outside a particular (possibly) empty set of "next" observable actions, however, would be observably different from a program whose actions are limited to those in the set. – supercat Jan 30 '20 at 18:30
  • @EricTowers: BTW, I wonder how many useful optimizations are facilitated by making assumptions about loops that wouldn't be better facilitated by having a means of qualifying functions or prototypes as "pure"? – supercat Jan 30 '20 at 18:32
  • Does it mean that per "implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced" `while(1);` can be optimized out? – pmor Oct 28 '22 at 15:01
  • @pmor: Code which follows an unconditional branch and is not the target of another branch is *statically unreachable*. Although it is in general difficult if not intractable to determine whether there are any inputs a program could receive that would result in a particular statically reachable piece of code actually being executed, classifying parts of the program as statically unreachable is trivial: code between an unconditional branch and the next branch target is statically unreachable, as is code between a call to a function with no reachable exits and the next branch target. – supercat Apr 18 '23 at 20:29
  • @supercat A simple question: are self-modifying programs / code out of scope of the C standard? Is it possible to write a strictly conforming self-modifying C program? – pmor Apr 19 '23 at 06:16
  • @pmor: The only means of having a program modify executable behavior which would not involve Undefined Behavior would be writing an executable file in binary mode and then using the `system` function to execute it. The behavior of the `system` function is Implementation-Defined in almost all cases, because while the actual behavior is outside the Standard's jurisdiction, it would have come across as silly to have a function whose behavior is never defined. – supercat Apr 19 '23 at 14:49
  • @pmor: Actually, I suppose another alternative would be to write a C source file in text mode, and then use the `system` function to invoke a C compiler on it, and then run the executable from that, but the main point is that the Standard says nothing about the effect of doing something like `system("cc foo.c");` or `system("./a.out");`, but an execution environment might specify such things. – supercat Apr 19 '23 at 16:00
10

There's no way to detect infinite loops universally: see the Halting Problem. So the best any compiler could do is take a decent guess - for example the obvious case mentioned in the OP.

But why would this be desirable? I could see emitting a warning and still allowing the behavior, but to remove the loop is not an "optimization" - it changes the behavior of the program!

Dan
  • 10,990
  • 7
  • 51
  • 80
  • 3
    The first rule of the halting problem is that nobody mentions the halting problem. – Martin Beckett Feb 01 '10 at 16:26
  • 9
    While it's impossible for arbitrary programs, it's certainly possible for sufficiently trivial loops such as this one, though. –  Feb 01 '10 at 16:55
  • Consider a function `unsigned normalize_lsb(unsigned x) { while (!(x & 1)) x>>=1; return x;}`. In most cases where the return value of the function is ignored, the program behavior that would result from replacing the function with a no-op would be acceptable, *even if `x` might be zero*`. Since proving the return value will be ignored would often be easy, while proving that `x` couldn't be zero, allowing a compiler to omit the loop in such cases would be a useful optimization *provided that optimizations were limited to deferring or skipping execution of the loop, rather than* ... – supercat Mar 20 '22 at 21:04
  • ...*allowing completely arbitrary behavior*. If the only way for programmers to ensure anything meaningful about program behavior in cases where `x` might be zero would be to either add code to skip the loop if `x` is zero, or add a dummy side effect within the loop, the net effect would be that programmers would have to write otherwise-unnecessary code whose performance, with the "optimization", would be worse than if the optimization didn't exist but the programmer didn't have to make accommodations for it. – supercat Mar 20 '22 at 21:07
7

The loop is not dead code, it is basically preventing the program from ever reaching whatever comes after it. This is not what would happen if the loop was removed, so the compiler can not remove the loop.

It might replace it with a platform-dependent idle-instruction to signal the processor that the thread is not going to do anything any more.

What the compiler can do is remove any code that comes after the loop, because it is unreachable and will never be executed.

Timbo
  • 27,472
  • 11
  • 50
  • 75
4

This has been discussed many times before on comp.lang.c (eg. here) without, as far as I know, any consensus outcome.

caf
  • 233,326
  • 40
  • 323
  • 462
2

They are a necessity when writing daemons. Why'd you want to call them dead code?

dirkgently
  • 108,024
  • 16
  • 131
  • 187
  • 5
    That would not be a really nice daemon since while(1); technically doesn't put that thread to sleep and as very processor intensive. Better would be something like while(1) Sleep(5000); or something similar – Toad Feb 01 '10 at 16:16
  • That is a side-issue. At any rate, `while (1)` is anything but _dead code_ -- which was my point. And yes, I think I saw some code analysis tool report it as such -- an infinite loop. – dirkgently Feb 01 '10 at 16:18
  • 2
    But `while(1);` *is* essentially dead code -- it doesn't do anything useful except block the CPU. Much better to have something like: char exiting=0; while(!exiting) { exiting = process_event(); } – TMN Feb 01 '10 at 17:12
  • @TMN: I was talking about the generic case, when you do something inside the loop. I see where the comments come from now :-) – dirkgently Feb 01 '10 at 17:15
  • @TMN: Perhaps it would be helpful if the Standard were to define a macro called `__SIDE_EFFECT();` which would be equivalent to reading an arbitrary `volatile` variable and ignoring the result, except that the code generator would be allowed to omit the read if it could do so without affecting any other aspect of program behavior. In that case, `while(condition_yielding_true) __SIDE_EFFECT();` would be well-defined if `condition_yielding_true` was well-defined, without regard for whether the condition was recognizable as a constant. – supercat Aug 05 '15 at 16:29
0

I believe that newer standards explicitly provide that if a piece of code does not access any volatile variables, perform I/O, etc. any other code which does not rely upon anything computed in the first piece may be arbitrarily sequenced before it. If an infinite loop does not perform any I/O, nor compute any value which is used later in the program, a compiler may simply defer execution of the loop until everything else has completed.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • You can belive even in God, but there is can be no ISO standard of the God behaviour. Are there C-like languages stadards which allow this? – osgx Dec 08 '10 at 18:56
  • 1
    @osgx: Consider the code "int main(void) {do_something(); do_something_else(); return 0;}" Suppose the compiler can look at the code and determine that do_something() does not write any volatile variable, nor does it write any variable that do_something_else() is ever going to use. The only thing do_something() could possibly do that would alter the output of do_something_else() would be to infinite-loop. If a compiler were only allowed to drop code which couldn't loop infinitely, it would be forced to include lots of useless code. Allowing infinite loops to be dropped makes things easier. – supercat Dec 08 '10 at 20:40
  • c99 5.1.2.3 "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations" - it is the only side effects. They "change the state of the execution environment". "An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced" – osgx Dec 11 '10 at 02:07
  • 3
    @osgx: By my interpretation, all the confusing stuff about loops being assumed to terminate may be summed up as, "the time required to execute a piece of code, even if infinite, shall not in and of itself be considered a side-effect that a compiler must maintain." – supercat Dec 11 '10 at 05:38
  • 3
    @osgx: Since writing the above, I have come to realize that a more accurate statement would have been "The Standard could have been simplified, *and the semantics improved*, if the confusing stuff about loops being presumed to terminate were summed up [as shown]". When I wrote the statement, I hadn't realized that the authors of any compiler people actually use would deliberately *seek out* cases where programs invoke Undefined Behavior and use them as excuses to negate laws of time and causality. – supercat Aug 05 '15 at 16:24
  • Does it mean that per "implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced" `while(1);` can be optimized out? – pmor Oct 28 '22 at 15:00
  • @pmor: Worse thant that: if clang can deduce that a loop couldn't exit unless some condition were true, and also that the loop has no side effects beyond the (possibly infinite) time to run it, and some later code evaluates the same condition, clang will optimize out *both* the loop and the later condition check. – supercat Oct 28 '22 at 16:18