21

I was reading this paper on undefined behaviour and one of the example "optimisations" looks highly dubious:

if (arg2 == 0)
    ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO),
                    errmsg("division by zero")));
/* No overflow is possible */
PG_RETURN_INT32((int32) arg1 / arg2);

Figure 2: An unexpected optimization voids the division-by-zero check, in src/backend/utils/adt/int8.c of PostgreSQL. The call to ereport(ERROR, :::) will raise an exception.

Essentially, the compiler assumes that ereport will return, and removes the arg2 == 0 check since the presence of the division implies a non-zero denominator, i.e. arg2 != 0.

Is this a valid optimisation? Is the compiler free to assume that a function will always return?

EDIT: The whole thing depends on ereport, which is described thus:

   84 /*----------
   85  * New-style error reporting API: to be used in this way:
   86  *      ereport(ERROR,
   87  *              (errcode(ERRCODE_UNDEFINED_CURSOR),
   88  *               errmsg("portal \"%s\" not found", stmt->portalname),
   89  *               ... other errxxx() fields as needed ...));
   90  *
   91  * The error level is required, and so is a primary error message (errmsg
   92  * or errmsg_internal).  All else is optional.  errcode() defaults to
   93  * ERRCODE_INTERNAL_ERROR if elevel is ERROR or more, ERRCODE_WARNING
   94  * if elevel is WARNING, or ERRCODE_SUCCESSFUL_COMPLETION if elevel is
   95  * NOTICE or below.
   96  *
   97  * ereport_domain() allows a message domain to be specified, for modules that
   98  * wish to use a different message catalog from the backend's.  To avoid having
   99  * one copy of the default text domain per .o file, we define it as NULL here
  100  * and have errstart insert the default text domain.  Modules can either use
  101  * ereport_domain() directly, or preferably they can override the TEXTDOMAIN
  102  * macro.
  103  *
  104  * If elevel >= ERROR, the call will not return; we try to inform the compiler
  105  * of that via pg_unreachable().  However, no useful optimization effect is
  106  * obtained unless the compiler sees elevel as a compile-time constant, else
  107  * we're just adding code bloat.  So, if __builtin_constant_p is available,
  108  * use that to cause the second if() to vanish completely for non-constant
  109  * cases.  We avoid using a local variable because it's not necessary and
  110  * prevents gcc from making the unreachability deduction at optlevel -O0.
  111  *----------
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
Peter Alexander
  • 53,344
  • 14
  • 119
  • 168
  • 1
    I think the compiler is assuming since division by zero is undefined that the if will never be taken, which is valid if non-intuitive. – Shafik Yaghmour Nov 18 '13 at 22:44
  • 1
    That would seem to be an invalid optimization since there could be (in fact, are) side-effects in the ereport call should occur even if an exception is thrown. Not to say that the optimization isn't done -- C/C++ optimization is pretty much the Wild West as opposed to, say, Java, where this stuff is tightly defined. – Hot Licks Nov 18 '13 at 22:46
  • @ShafikYaghmour: I understand that, but division-by-zero only occurs if the code reaches the division with `arg2 == 0`. In this case, it could not. – Peter Alexander Nov 18 '13 at 22:47
  • 1
    I am pretty sure *John Regehr* has a article on this [somewhere on his site](http://blog.regehr.org/) let me see if I can dig it up. Ok I found one [Contest: Craziest Compiler Output due to Undefined Behavior](http://blog.regehr.org/archives/759). – Shafik Yaghmour Nov 18 '13 at 22:50
  • @PeterAlexander: Presumably the compiler doesn't know that `ereport` won't return (because it wasn't told). IIUC, in this case the division may be moved above the check (since it's assumed to always be reachable) – Hasturkun Nov 18 '13 at 23:02
  • @Hasturkun: is it legal to assume it won't return? What if the call was to exit instead? Could it still do the optimisation then? – Peter Alexander Nov 18 '13 at 23:40
  • 4
    I'm reading the explanation in that paper you linked differently. It doesn't say that the compiler removed the test; it says that the optimizer moved the division before the test. That makes more sense to me (even if the behavior is effectively the same). – Adrian McCarthy Nov 19 '13 at 00:01
  • 2
    That would be OK provided you're in a maths mode where division by zero is harmless (for the sake of argument it results in a NaN, or some arbitrary value like 17). Then in the case where `arg2 == 0` the compiler can do whatever side-effect-free computation it likes before calling `ereport`, as long as it does call `ereport`. – Steve Jessop Nov 19 '13 at 00:08
  • I have an embedded project that needs to shutdown safely and wait for power off. Does this mean that my `while(forever)` loop is illegal since it never returns? The ARM processors don't have a HALT instruction so what is one to do? – Thomas Matthews Nov 19 '13 at 01:55
  • @ThomasMatthews: see the discussion under Michael Burr's answer. – Steve Jessop Nov 19 '13 at 03:04
  • [[no return]] is a c++11 attribute you can tack on to function decelerations to state that the function shouldn't return. It can allow further optimization. http://en.cppreference.com/w/cpp/language/attributes – Trevor Hickey Nov 19 '13 at 04:07
  • @TrevorHickey: but the compiler cannot assume that a function without the `[[noreturn]]` attribute will return. In fact, as @BenVoigt points out in a comment, there are functions which do not return (sometimes) that **cannot** be marked `[[noreturn]]` because sometimes they do return. – Michael Burr Nov 19 '13 at 05:47
  • -1 because the question misrepresents the claim in the paper (which is that the division may be moved before the check, not that the check is removed). – Adrian McCarthy Dec 17 '13 at 16:39

7 Answers7

16

Is the compiler free to assume that a function will always return?

It is not legal in C or C++ for a compiler to optimize on that basis, unless it somehow specifically knows that ereport returns (for example by inlining it and inspecting the code).

ereport depends on at least one #define and on the values passed in, so I can't be sure, but it certainly looks to be designed to conditionally not return (and it calls an extern function errstart that, as far as the compiler knows, may or may not return). So if the compiler really is assuming that it always returns then either the compiler is wrong, or the implementation of ereport is wrong, or I've completely misunderstood it.

The paper says,

However, the programmer failed to inform the compiler that the call to ereport(ERROR, ::: ) does not return.

I don't believe that the programmer has any such obligation, unless perhaps there's some non-standard extension in effect when compiling this particular code, that enables an optimization that's documented to break valid code under certain conditions.

Unfortunately it is rather difficult to prove the code transformation is incorrect by citing the standard, since I can't quote anything to show that there isn't, tucked away somewhere in pages 700-900, a little clause that says "oh, by the way, all functions must return". I haven't actually read every line of the standard, but such a clause would be absurd: functions need to be allowed to call abort() or exit() or longjmp(). In C++ they can also throw exceptions. And they need to be allowed to do this conditionally -- the attribute noreturn means that the function never returns, not that it might not return, and its absence proves nothing about whether the function returns or not. My experience of both standards is that they aren't (that) absurd.

Optimizations are not allowed to break valid programs, they're constrained by the "as-if" rule that observable behaviour is preserved. If ereport doesn't return then the "optimization" changes the observable behaviour of the program (from doing whatever ereport does instead of returning, to having undefined behaviour due to the division by zero). Hence it is forbidden.

There's more information on this particular issue here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180

It mentions a GCC bug report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968 that was (rightly IMO) rejected, but if ereport doesn't return then the PostGreSQL issue is not the same as that rejected GCC bug report.

In the debian bug description is the following:

The gcc guys are full of it. The issue that is relevant here is the C standard's definition of sequence points, and in particular the requirement that visible side effects of a later statement cannot happen before the execution of an earlier function call. The last time I pestered them about this, I got some lame claim that a SIGFPE wasn't a side effect within the definitions of the spec. At that point useful discussion stopped, because it's impossible to negotiate with someone who's willing to claim that.

In point of fact, if a later statement has UB then it is explicitly stated in the standard that the whole program has UB. Ben has the citation in his answer. It is not the case (as this person seems to think) that all visible side effects must occur up to the last sequence point before the UB. UB permits inventing a time machine (and more prosaically, it permits out of order execution that assumes everything executed has defined behaviour). The gcc guys are not full of it if that's all they say.

A SIGFPE would be a visible side effect if the compiler chooses to guarantee and document (as an extension to the standard) that it occurs, but if it's just the result of UB then it is not. Compare for example the -fwrapv option to GCC, which changes integer overflow from UB (what the standard says) to wrap-around (which the compiler guarantees, only if you specify the option). On MIPS, gcc has an option -mcheck-zero-division, which looks like it does define behaviour on division by zero, but I've never used it.

It's possible that the authors of the paper noticed the wrongness of that complaint against GCC, and the thought that one of the PostGreSQL authors was wrong in this way influenced them when they put the snigger quotes into:

We found seven similar issues in PostgreSQL, which were noted as “GCC bugs” in source code comments

But a function not returning is very different from a function returning after some side effects. If it doesn't return, the statement that would have UB is not executed within the definition of the C (or C++) abstract machine in the standard. Unreached statements aren't executed: I hope this isn't contentious. So if the "gcc guys" were to claim that UB from unreached statements renders the whole program undefined, then they'd be full of it. I don't know that they have claimed that, and at the end of the Debian report there's a suggestion that the issue might have gone away by GCC 4.4. If so then perhaps PostGreSQL indeed had encountered an eventually-acknowledged bug, not (as the author of the paper you link to thinks) a valid optimization or (as the person who says the gcc guys are full of it thinks) a misinterpretation of the standard by GCC's authors.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 1
    You're correct that the possibility of exceptions changes things for C++, but if the function was `noexcept` or `throw()` the optimizer could make the same assumptions as a C compiler can. – Jonathan Wakely Nov 19 '13 at 00:29
  • "If the function was `noexcept`"... What function? How do you mark a macro `noexcept`? – Ben Voigt Nov 19 '13 at 00:32
  • 1
    @BenVoigt, did you read the answer I was commenting on? It talks about "calls" and "returning", why not apply your exceptional pedantry there? You don't "call" a macro and a macro doesn't "return". Sorry for commenting in context. If you like, read it as "if `noexcept(ereport(...))` is true" instead. – Jonathan Wakely Nov 19 '13 at 00:39
  • @BenVoigt: well, either the macro expands to something that calls beyond the ken of the compiler (that is, a function with no available definition), in which case Jonathan's comment applies to that function, or else it doesn't (in which case the compiler can inspect the code that it does expand to, it doesn't need hints). – Steve Jessop Nov 19 '13 at 00:39
  • @SteveJessop, in C++ I think 1.10/24 allows the implementation to assume all threads make progress (i.e. assume no loops are infinite). I don't see equivalent text in C. – Jonathan Wakely Nov 19 '13 at 00:40
  • @JonathanWakely: 1.10/24 doesn't forbid all infinite loops, only computation-bound ones. – Ben Voigt Nov 19 '13 at 00:41
  • _"but talks about raising an exception."_ It says at the instruction-set level, it's not talking about C++ exceptions. – Jonathan Wakely Nov 19 '13 at 00:49
  • @JonathanWakely: The division by zero raises a CPU exception. `ereport` doesn't. – Ben Voigt Nov 19 '13 at 00:56
  • @JonathanWakely: Infinite loops aren't necessary, the function is allowed to call the standard C function `exit()`. – caf Nov 19 '13 at 02:53
  • I do not understand the purpose of the second half of your post. It must be legal to use control flow to steer the path of execution away from UB. (`ereport`, when passed `ERROR`, never returns.) – tmyklebu Aug 05 '15 at 17:13
8

I think the answer is found, at least for C++, in section 1.9p5

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

In fact, the macro expands to a call to errstart which will return (ERROR >= ERROR), obviously true. That triggers a call to errfinish which calls proc_exit which runs some registered cleanup and then the Standard runtime function exit. So there is no possible execution that contains a divide-by-zero. However, the compiler logic testing this must have gotten it wrong. Or perhaps an earlier version of the code failed to properly exit.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 6
    “if any such execution contains an undefined operation” – but **it does not**. Unless you apply the quoted section *before* considering whether any operation contained in the sequence of operations therein is undefined. For `arg2==0`, the code is never executed and thus *no* “execution contains an undefined operation” (since there is no execution, full stop). – Konrad Rudolph Nov 19 '13 at 00:30
  • 1
    @KonradRudolph: I agree that it's a compiler bug, but this is the rule they were trying to implement. – Ben Voigt Nov 19 '13 at 00:33
  • Ah. In that case, I think you may have struck gold. – Konrad Rudolph Nov 19 '13 at 00:34
  • 1
    @Ben: your comment leaves me confused - is this answer why the optimization is valid or to why the optimization is invalid? – Michael Burr Nov 19 '13 at 00:38
  • @MichaelBurr: This is why reordering (which btw kills the write to `stderr`) would be valid if after `ereport` the code continued and performed division. But (at least in the current version) `ereport(ERROR, ...)` will not continue, it will exit. – Ben Voigt Nov 19 '13 at 00:40
  • Does this then mean that compilers must assume that *all* function calls (to which they cannot see the definition of) may exit/never return, to err on the side of correctness, when performing optimizations (and assuming post-function-call statements cannot invoke undefined behavior)? – Cornstalks Nov 19 '13 at 02:02
  • Yes. Is that surprising, that the only assumption that can be made about opaque functions is that they don't contain undefined behavior? – Ben Voigt Nov 19 '13 at 02:05
  • 2
    @Cornstalks: I would say so, yes (in fact I have said exactly that on a comment somewhere around here, I can't blame you for not spotting it). In the same way, compilers must assume that all functions they cannot see, may modify any non-`restrict` memory to which a pointer might exist outside the calling function. These assumptions impede optimization a lot. In fact much of the benefit of inlining code isn't avoiding the cost of pushing parameters on the stack and jumping to the callee, it's the fact that the callee and caller can be optimized jointly instead of separately. – Steve Jessop Nov 19 '13 at 02:05
  • @SteveJessop: I wonder how many of the benefits of whole-program optimization which presently require tracing into functions could be provided much more cheaply by adding directives by which functions could specify their behavior, or client could instruct a compiler e.g. "Within this block, reads or writes of `someGlobalVar` may be considered unsequenced relative to any function calls." Having a programmer copy `someGlobalVar` to a local variable and using that local variable within the block might be faster than using the global if the local variable would end up in a register, but... – supercat Aug 05 '15 at 16:00
  • ...would likely be slower if the local variable ends up having to be stored on the stack. A compiler which knew it was allowed, but not required, to read the global variable to a register once and have repeated accesses use that register could exploit that freedom when beneficial but refrain doing so when not. Do you know of any proposals for such things? – supercat Aug 05 '15 at 16:05
4

It seems to me that unless the compiler can prove that ereport() doesn't call exit() or abort() or some other mechanism for program termination then this optimization is invalid. The language standard mentions several mechanisms for termination, and even defines the 'normal' program termination via returning from main() in terms of the exit() function.

Not to mention that program termination isn't necessary to avoid the division expression. for (;;) {} is perfectly valid C.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • That infinite loop is not legal in C++. – Ben Voigt Nov 19 '13 at 00:02
  • 1
    "`for (;;) {}` is perfectly valid C." Is that still the case? I thought that they were going to add language to both standards saying that every thread must eventually exit or have observable behaviour. I can't remember why, but there's some reason compilers supporting eternal do-nothing loops makes real programs worse. – Steve Jessop Nov 19 '13 at 00:03
  • @SteveJessop: C11 appears not to have the same rule. – Ben Voigt Nov 19 '13 at 00:06
  • Now that you mention it, I do seem to recall that C++11 may have forbidden the infinite loop; I'll need to research that (can someone comment with the section number?). However, the fact that here are several forms of program termination still seems to make it so the compiler can't blindly assume the division will occur. Of course, there are times when the compiler might be able to determine with certainty that a function will return (as @SteveJessop mentions in his answer). – Michael Burr Nov 19 '13 at 00:13
  • @BenVoigt Paragraph? That seems like a downright silly restriction. Having an infinite loop that can only be interrupted by the operating system may not seem as the cleanest design (indeed it isn’t) but it’s nevertheless a pretty common and useful pattern, and unless the C++ standard suddenly knows about OS interrupts I don’t see a good way around this pattern for many applications. – Konrad Rudolph Nov 19 '13 at 00:13
  • @Konrad: 1.10p24. "The implementation may assume that any thread will eventually do one of the following: — terminate, — make a call to a library I/O function, — access or modify a volatile object, or — perform a synchronization operation or an atomic operation." Anyway, waiting for interruption by the OS should use a blocking synchronization operation (or at least a sleep) -- yes C++ now has those -- and not a busy loop. – Ben Voigt Nov 19 '13 at 00:17
  • @BenVoigt Ah, okay, I interpreted that as a placeholder for a *non-empty* looping statement. – Konrad Rudolph Nov 19 '13 at 00:23
  • 4
    Ok, so in C++11 an infinite loop must have a bit more. This seems like it'll do the trick: `for (;;) { volatile int i; i = 0; }` – Michael Burr Nov 19 '13 at 00:34
  • @SteveJessop: By my reading, the C Standard requires if a loop would--if compile-time expressions were replaced with their values--be *syntactically* incapable of exiting, the compiler must generate code which does not perform any actions "after" the loop. If a loop would be *syntactically* capable of exiting, even if it would require a condition that will never actually arise, code which follows the loop, but which won't depend upon or affect its results, may be sequenced before it. – supercat Jul 06 '15 at 18:21
  • Note the C and C++ standard treat infinite loops slightly differently; see [Are compilers allowed to eliminate infinite loops?](http://stackoverflow.com/q/2178115/1708801) and [Optimizing away a “while\(1);” in C++0x](http://stackoverflow.com/q/3592557/1708801). There is a better C++ question on infinite loops but I can't find it now :-( – Shafik Yaghmour Aug 05 '15 at 12:08
2

No, in the newest C standard, C11, there is even an new keyword to specify that a function would not return, _Noreturn.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 1
    What about prior to C11? Was it a valid optimisation before `_Noreturn` existed? – Peter Alexander Nov 18 '13 at 22:49
  • 4
    In C all optimizations are valid for which the compiler can prove that the code does perform **as if** it would be executed in the abstract state machine. Thus, a pre-C11 could only perform such an optimization if the code of `ereport` would be visible to it, e.g because this is a macro or an `inline` function, and all branches lead to `exit`, `abort` or alike. Because of such a restriction, the new keyword was introduced. – Jens Gustedt Nov 18 '13 at 22:54
  • @BenVoigt, the macro itself doesn't need to be marked. Only functions for which the compiler doesn't see the definition would need that. That is any invocation of a function inside the macro that implements the non-returning aspect would have to be marked as such. – Jens Gustedt Nov 19 '13 at 00:50
  • 3
    Just for clarification: functions that do not return do not **need** to be marked with `_Noreturn` (or the `noreturn` attribute in C++11). Those new language features allow the compiler to assume that functions marked with the keyword or attribute won't return - those features don't permit the compiler to automatically assume a non-marked function **will** return. – Michael Burr Nov 19 '13 at 01:29
  • I suspect the `_Noreturn` directive was added primarily to eliminate compiler warnings when a function with a defined return type calls such a method and (seemingly) allows execution to fall off the end; it might as a secondary benefit allow a compiler to eliminate a few useless `"jump" or "return" instructions. For some reason, the Standards Committee would rather encourage compiler writers to throw out decades of precedent which favored having compilers implement certain behaviors more consistently than required by Standard, than support directives which would facilitate optimization. – supercat Aug 05 '15 at 16:58
  • @supercat, these are not just the "useless" `jmp` etc instructions, but all the stuff otherwise goes with a function call, in particular all saving of the state of the caller on the stack. By that this also may reduce stack use of the caller. Then, I simply didn't catch what you are intending to say with your rant against the Standard Committee. Previous to the introduction of `_Noreturn` compiler implementors typically had extension for that. So this simply follows existing practice. – Jens Gustedt Aug 05 '15 at 18:28
  • @JensGustedt: My point was that the purpose of `_Noreturn` was to avoid requiring callers to include useless `return` instructions for the purpose of shutting up compiler warnings; its ability to assist optimization was a side benefit. C contains a dearth of directives whose sole purpose is to allow (but not require) compilers to make otherwise-illegal optimizations, even in cases where the only thing required to support such features in standard-compliant fashion would be to add a suitable header file and/or add some lines to existing ones, and where major performance improvements... – supercat Aug 05 '15 at 18:41
  • ...could be easily achieved. A directive to specifies that a function will always return would be something that C could easily add (specify that code using the directive must include a certain header file, and a standards-compliance could be achieved by defining it to an empty macro) but the Standards Committee seems very loath to define such things even in cases where any implementation could be brought into compliance merely by adding a header file. – supercat Aug 05 '15 at 18:58
2

The paper does not say that the if (arg2 == 0) check is removed. It says that the division is moved before the check.

Quoting the paper:

... GCC moves the division before the zero check arg2 == 0, causing division by zero.

The result is the same, but the reasoning is different.

If the compiler believes ereport will return, then it "knows" that the division will be performed in all cases. Furthermore, the if-statement doesn't affect the arguments of the division. And obviously, the division doesn't affect the if-statement. And while call to ereport might have observable side effects, the division does not (if we ignore any divide-by-zero exception).

Therefore, the compiler believes the as-if rule gives it the freedom to reorder these statements with respect to each other--it can move the division before the test because the observable behavior should be identical (for all of the cases that yield defined behavior).

One way to look at it is that undefined behavior includes time travel. ;-)

I'd argue that undefined behavior (e.g., dividing by 0), should be considered observable behavior. That would prevent this reordering because the observable behavior of the division must happen after the observable behavior of the call to ereport. But I don't write standards or compilers.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • It isn't just allowed to re-order, it can remove the call to `ereport` completely. The optimizer knows that division-by-zero never happens in a valid program, therefore the call to `ereport` will never happen in a valid program, so it's OK to remove the conditional and the call as dead code. – Jonathan Wakely Nov 19 '13 at 00:23
  • @JonathanWakely: That's not what the paper says. – Adrian McCarthy Nov 19 '13 at 00:25
  • 1
    @JonathanWakely Except that this necessitates a circular definition of “valid code”. I think that any reasonable non-circular definition should consider this code as valid since it will **never** perform the division-by-zero (I’ve added the caveat “reasonable” here since obviously we could define “valid” to mean “statically verifiable” but that is obviously not a useful definition for a Turing complete language). – Konrad Rudolph Nov 19 '13 at 00:25
  • @Konrad, it's not circular. If you write valid code the optimization is OK. If you write invalid code you can't complain if your program crashes. – Jonathan Wakely Nov 19 '13 at 00:27
  • 1
    @Jonathan It’s valid code (even assuming that `arg2==0`). You can only claim that it’s invalid code by invoking a circular definition (“it’s invalid because it divides by zero and that is invalid”). – Konrad Rudolph Nov 19 '13 at 00:30
  • How is that circular? A valid program never divides by zero, so if the program divides by `arg2` then either `arg2` is non-zero or the program is invalid (and any result is OK). In both cases you can assume `arg2` is non-zero. – Jonathan Wakely Nov 19 '13 at 00:35
  • 1
    @Jonathan *Or* the program ensures that no division happens when `arg2==0` (which is the case here). Your argument relies on a false dilemma. – Konrad Rudolph Nov 19 '13 at 00:37
  • @Konrad, the whole point is that in the absence of a call to a `noreturn` function (or a path that could throw an exception in C++) the compiler does _not_ assume that. – Jonathan Wakely Nov 19 '13 at 00:45
  • 4
    @JonathanWakely: so just to be clear, are you saying it's UB for a function to not return if it's not marked `noreturn`? That's the only way I can see for your original claim, "the call to ereport will never happen in a valid program" to be true: if `ereport` is known certainly to return then it cannot happen because it would lead to division by zero. But it isn't known to certainly return, because (the questioner posits) it does not in fact return. – Steve Jessop Nov 19 '13 at 00:52
  • No, I'm not saying "what the compiler does defines what the standard says". Yes, I'm saying the standard assumes functions return (unless they are reach a `noreturn` path, or throw an exception in C++). – Jonathan Wakely Nov 19 '13 at 00:53
  • 3
    @JonathanWakely: The Standard assumes no such thing. At least in case of the C++11 one, which I'm more familiar with. – Ben Voigt Nov 19 '13 at 00:54
  • 2
    @JonathanWakely: Sorry, I revised my comment so I no longer accuse you of saying that. I got a bit confused :-) But anyway, the compiler must treat unknown code as possibly reaching a `noreturn` path, and there's no obligation on the declaration to signal this. – Steve Jessop Nov 19 '13 at 00:55
  • Specifically this `ereport` function appears to call an extern function called `errstart`, which the comments say conditionally doesn't return, but is not marked `noreturn` in the declaration. But I confess that I'm not all that interested in the specific case of what that macro finally expands to, since it depends on some `#defines`. I'm more interested in "when can the optimizer assume a function returns" than in the specific "does this macro meet those conditions". – Steve Jessop Nov 19 '13 at 01:00
0

In embedded systems, functions that never return are commonplace. They should not be optimized either.

For example, a common algorithm is to have a forever loop in main() (a.k.a. the background loop), and all functionality takes place in an ISR (Interrupt Service Routine).

Another example are RTOS tasks. In our embedded system project, we have tasks that are in an infinte loop: Pend on message queue, process message, repeat. They will do this for the life of the project.

Some embedded systems have safe shutdown loops where they place the machine into a safe state, locking out all User Input, and wait for power shutdown or reset.

Also, some embedded systems can shutdown the system. Shutting down the power prevents the system from returning.

There are reasons that not all functions need to return or must be required to return. If all functions returned that are in your cell phone, you wouldn't be fast enough to use it.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
-1

Most functions are assumed to eventually return. There are compiler-specific extensions in some compilers to inform the compiler that a function will never return.

__attribute__ ((noreturn)) does this for gcc.

mah
  • 39,056
  • 9
  • 76
  • 93
  • 2
    Can you provide a standard reference for that? The standard explicitly mentions that `exit` will not return, so what if `ereport` called `exit`? – Peter Alexander Nov 18 '13 at 22:46
  • 1
    This can be done in standard C++ now using the [`noreturn`](http://en.cppreference.com/w/cpp/language/attributes) attribute – Praetorian Nov 18 '13 at 22:47
  • I don't believe there is a standard reference because it's not a part of the language specification; rather it's a part of how compiler writers choose to implement optimizations. If `ereport()` _always_ calls `exit()` I would expect a well written compiler to bubble the attribute up, but if I wanted the optimization I would still add the attribute manually -- err on the side of "it doesn't hurt and it could help". – mah Nov 18 '13 at 22:50
  • I do not think that this answers the question. The `noreturn` attribute informs the compiler that the function can be assumed not to return for any argument. The existence of this attribute in itself does not give the compiler license to assume that any function that does not have it terminates for all or some arguments. – Pascal Cuoq Nov 18 '13 at 23:24
  • @PascalCuoq: A well-formed program begins execution in `main` (ignoring global constructors for brevity) and ends with `main` returning. Except if `exit` or `terminate` are called, both of which are by definition `noreturn`. Insofar, yes, absence of `noreturn` is a license to assume the function returns. Without returning, there's no way one could get back to `main` to return. – Damon Nov 18 '13 at 23:40
  • 5
    @Damon: There are plenty of ways to get back to `main` without returning. `longjmp` and C++ `throw` are two of them. And no, absence of `noreturn` does not mean that the function always returns. Even with a programmer who religiously uses `noreturn`, its absence only indicates that there exists some combination of parameters and observable state that lead to a return. For example, `exec()` cannot be marked `noreturn`. "The `exec()` functions only return if an error has occurred. The return value is -1, and `errno` is set to indicate the error." – Ben Voigt Nov 18 '13 at 23:57
  • @Damon { volatile int X = 0; while (1) X; } – Pascal Cuoq Nov 19 '13 at 00:04
  • 2
    And why should a well-formed program return from `main()`? – Michael Burr Nov 19 '13 at 01:33
  • @BenVoigt: I hope that's a joke. If an exception is _the only_ way to return to `main` in your programs, then frankly you're a pitiful programmer. Same goes for `longjmp`, which has no place in a sane program other than for signal handlers. @Michael Burr: Because leaving `main` and calling `exit` are the only two ways named in 3.6.1 par 4 and 5 of terminating a program (short of crashing, which the standard doesn't account for), with the note of `exit` not destroying static and thread storage objects and thus invoking UB in their presence. Therefore exiting from `main` is the correct thing. – Damon Nov 19 '13 at 11:48
  • 1
    @Damon: I don't know why you equate "exceptions may be used somewhere" with "there is no flow control except exceptions". The rest of what you said is just attacking that straw man. – Ben Voigt Nov 19 '13 at 13:37
  • @BenVoigt: It's the way you turned my statement around. I said that if a function doesn't return, you can't get back to `main` to terminate the program (except if you call `exit`, which by definition doesn't return). You replied to that with "exceptions". If an exception is the only way of returning to `main`, it's _expection_ handling, and the program is effectively hung in an infinite loop. A function that _doesn't_ call a noreturn function (i.e. `exit`) and isn't noreturn therefore _can be assumed_ by the compiler to be able to return, and eventually, according to the program flow, return. – Damon Nov 19 '13 at 14:11
  • 1
    No, it can't. That function might return if its argument is non negative and throw an exception if its argument is negative. Can't mark that noreturn. And can't assume it will return. And you still threw in the word "only" where it doesn't belong. – Ben Voigt Nov 19 '13 at 14:27
  • 1
    @Damon: `exit()` does destroy static objects. Thread local objects are destroyed when the thread ends. Many programs are not multi-threaded so don't have thread local objects to worry about. Multi-threaded programs can arrange to wait for threads to complete before calling `exit()` to ensure that thread local objects are destroyed if necessary. Finally, and most importantly, an optimizer is not allowed to perform an optimization that breaks specifications simply because code might not be good style or follow best practice. – Michael Burr Nov 19 '13 at 16:18