16

While looking through code at work I found some (seemingly) offensive code in which a function had a return type, but no return. I knew the code worked, but assumed it was just a bug in the compiler.

I wrote the following test and ran it using my compiler (gcc (Homebrew gcc 5.2.0) 5.2.0)

#include <stdio.h>

int f(int a, int b) {
  int c = a + b;
}

int main() {
   int x = 5, y = 6;

   printf("f(%d,%d) is %d\n", x, y, f(x,y)); // f(5,6) is 11

   return 0;
}

Analogous to the code I found at work, this defaults to returning the result of the last expression executed in the function.

I found this question, but was not satisfied with the answer. I know that with -Wall -Werror this behavior can be avoided, but why is it an option? Why is this still allowed?

Community
  • 1
  • 1
erip
  • 16,374
  • 11
  • 66
  • 121
  • 1
    What would be an answer to this question, minutes from some meeting of the standards committee regarding why they didn't deprecate this? – Chris Beck Sep 11 '15 at 00:50
  • 1
    See [here](http://stackoverflow.com/questions/10079089/implicit-int-return-value-of-c-function). It appears that, according to '89 standards the default return from a function with no specified return, but a return type, is the same as if you placed an empty `return;` statement at the end of your function. – asdf Sep 11 '15 at 00:52
  • 1
    Did you compile it as C or as C++, or both? So far as I am aware, a C++ compiler would flag a semantic error here, but a C compiler would let it pass (possibly with a warning). – Logicrat Sep 11 '15 at 01:01
  • 1
    @Logicrat I compiled with both `gcc` and `g++`. With no flags, it passes both compilations without barking. – erip Sep 11 '15 at 01:02
  • Wow, I'm surprised. ;) I've had MSVC 10 refuse to compile a function before I put a `return` statement at the end of it, even though that `return` was unreachable because all branches of the preceding `switch` had returns. – Logicrat Sep 11 '15 at 01:05
  • IIRC, MSVC doesn't follow C++ standards. – erip Sep 11 '15 at 01:06
  • 2
    I'm thinking that it's compiler dependent. It depends on how your compiler will implement the `add` instruction. If it saves the result in `rax`, there you have your return. Otherwise, it's UB. – Enzo Ferber Sep 11 '15 at 01:10
  • One reason you should always enable warnings. For gcc at least `-Wall -Wextra`. Compiling C code witout warnings is good way to hell - especially for beginners. (I always wonder why tutors don't tell how to enable warnings in one of the very first lessons). – too honest for this site Sep 11 '15 at 01:21
  • @EnzoFerber: What are you talking about? C does not know about an `add` instruction or `rax`. UB is not dependent on the CPU used, but on the standard. Just as it seems to work does not imply it is **not** UB. – too honest for this site Sep 11 '15 at 01:25
  • As for "why is it an option" see my [comment here](http://stackoverflow.com/questions/31739792/is-uninitialized-local-variable-the-fastest-random-number-generator/31746063#comment51562032_31746063) and the general discussion around it. – Shafik Yaghmour Sep 11 '15 at 01:27
  • @Olaf I'm talking about assembly and how the compiler will organize the instructions. UB might be the wrong term - it's more like _random_ behavior **if, and only if** your `add` result doesn't get stored in `rax`. You can write a simple asm add function with `movq %rdi, %rax addq %rsi, %rax`. And that's almost like GCC does it _without_ optimizing flags. – Enzo Ferber Sep 11 '15 at 01:30
  • @EnzoFerber: In a deterministic system whcih a computer is, there is no _random_ behaviour (one of the problems to get true random values). 1) ARM has no `rax`, so that is for a specific implementation. 2) this is absolutely irrelevant. UB is a matter of the standard, not the implementation. Once you invoke UB, you are on the dark side and cannot rely on anything. There is absolutely no use in further research in that direction ("yesterday we were standing on the edge of a cliff. Today we are one step further ..."). – too honest for this site Sep 11 '15 at 01:58
  • @Olaf I already admitted I was wrong using the term UB. Anyway, I said random since it will be "unexpected". If you don't know what's in `rax`, then it's unexpected when you print it. And one of the definitions of _random_ is _unexpected_. About dependencies: I said this was compiler dependent on the first post, but forgot to mention I was testing in a `x86_64`. My bad. – Enzo Ferber Sep 11 '15 at 03:02
  • @EnzoFerber: Again: The CPUs I program o not even have `rax` - like most CPUs in the world (sic!). It is simply **undefined**, not **unexpected**. Look out for nasal daemons. The definition of _random_ is **not** unexpected! If you have a 16 bit random-value register, you *do expect** any one of `65536` values, each with a specific probaility, but the overal probability being `1.0`. Yet there is simply no sense in ever thinking about the outcome. – too honest for this site Sep 11 '15 at 11:20
  • I had this "bug" in my code and compiled with both GCC (linux) and the VisualStudio 2005 compiler. The GCC compiler returned '0', the MS compiler returned `1`, but that was **NOT** the last return code. So it seems compiler dependent for sure, but still mysterious. Not recommended. – Wilbur Whateley Aug 31 '16 at 23:59

5 Answers5

14

"Why is this still allowed?" It is not, but in general, the compiler cannot prove you are doing it. Consider this (of course extremely simplified) example:

// Input is always true because logic reason
int fun (bool b) {
    if (b) {
        return 7;
    }
}

Or even this one:

int fun (bool b) {
    if (b) {
        return 7;
    }
    // Defined in a different translation unit, will always call exit()
    foo();
    // Now we can never get here, but the compiler cannot know
}

Now the first example could flow off the end, but it never will as long as the function is used "properly"; and the second one could not, but the compiler cannot know this. So the compiler would break "working" and legal, although probably stupid, code by making this an error.

Now the example you posted is a little different: Here, all paths flow off the end, so the compiler could reject or just ignore this function. It would however break real world code that relies on the compiler specific behavior as in your production code, and people do not like that, even if they are wrong.

But in the end, flowing off the end of a non-void function still is undefined behavior. It might work on certain compilers, but it is not and never was guaranteed to.

Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
  • 2
    Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/89408/discussion-on-answer-by-baum-mit-augen-c-and-c-functions-without-a-return-stat). – Martijn Pieters Sep 11 '15 at 16:52
  • In C (have not checked C++), it is not illegal for control to flow off the end of a non-void function, and the compiler cannot just ignore a non-void function with no `return` statement (although it could ignore the one in the question since it causes no change in observable behavior). Per C 2011 [N1570] 6.9.1 12: “If the **}** that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.” For example, you could write `int GetSet(int Option, int Value) { if (Option) /* Set */ Something = Value; else /* Get */ return Something; }`. – Eric Postpischil May 07 '18 at 01:38
  • @EricPostpischil Interesting, I did not know that. However, the exception with the unused result is not part if C++ (9.6.3/2 in N4659); flowing off the end of a non-void function other than `main` is UB unconditionally. – Baum mit Augen May 08 '18 at 21:26
6

It's not a bug in the compiler but the code is obviously badly formed. on x86 architecture, (R|E)AX is used as the accumulator register and also for return values.

So let's take a look at a completely unoptimised disassembly: https://goo.gl/TihXpa

  • You'll see that the code for f() is indeed using eax to store the addition result.

  • However... go ahead and use -O1 (or any optimisation level that's more aggressive) as a compiler option (top-right box) and see what happens now.

...We find that the compiler has correctly realised that the function doesn't explictly return its result and it just becomes a no-op.

Hence, this nasty piece of code is a perfect example of something that can work as-intended as a debug build but fail horribly when any optimisation is applied.

Community
  • 1
  • 1
Olipro
  • 3,489
  • 19
  • 25
4

The result of a function with a return type but no return statement is undefined (except in C++ with main, where the return value is 0). It's also not a syntax error, since the grammar uses a general form for a function that work for both value-returining and void functions.

Odds are what's going on here is that the result of the last expression is getting stored in the same register (on an x86, EAX or RAX) that the compiler is using for the return value of the function. When the function returns, the return instruction leaves this register alone, and the calling code simply assumes the value there is the return value of the function.

  • why C++ didn't make the choice to default construct whatever type is the return type, upon the return point, even if there is no statement. it'd be the same as `return {};` the return point is the closing brace, it's impossible to fail to identify that point. There can be no other flow that returns without passing here, except `throw`, explicit `return`, crash & inifinite loop. – v.oddou Mar 29 '20 at 17:01
3

In practice, the value being returned is probably the last value in some machine register (e.g. the one used to compute c). However, according to the standards, both the return value and results if the caller accesses/users the returned value are undefined.

As to why this sort of abusive/ugly code construct is allowed.... Firstly, in early versions of C there was no void keyword, and - if no return type was specified - return type for a function was int. For functions that were actually intended to return nothing, the technique was to (implicitly or explicitly) define them as returning int, not return any value, and have the caller make no attempt to access/use the return value.

Since compiler vendors had some freedom in how they handled this sort of thing, they didn't have to take any particular care in ensuring there was a valid return value, or ensuring that the return value was unused. In practice, it was largely fortuitous that - if the return value was accessed - it often happened to contain the value of the last operation in the function. Some programmers stumbled upon this behaviour of their code and - since code that was terse and cryptic was often viewed as some virtue at the time - made use of it.

Later on, even when the compile vendors in question tried to change the behaviour (e.g. to emit an error message and reject the code when a function "fell off the end"), they received bug reports from developers (some of whom were quite vocal) about their programs no longer working. Sad as it is, the compiler vendors caved to the pressure. Other compile vendors also caved to similar bug reports (of the form "gcc and compiler X does it this way - yours should too") because a number of those bug reports came from developers at large companies or government bodies who were paying customers for the compiler vendors. This is the reason that such things are diagnosed by most modern compilers (usually as an optional warning, disabled by default, such as gcc's -Wall option) and give behaviour that the developers expected.

The history of C, and therefore C++, is littered with a number of obscure features like this, which result from programmers exploiting some obscure behaviour of their early compilers and lobbying to prevent the behaviour being disabled.

The best modern practice is to turn on compiler warnings, and not to exploit such features. However, there is still enough legacy code - and developers who don't want to update codebases for various reasons (e.g. having to provide a slew of documentation to convince regulators the code still works) to stop using such features - that the features remain supported by compilers.

Peter
  • 35,646
  • 4
  • 32
  • 74
1

At computer architecture level: if there is a default place in memory where results are stored after simple calculations, this memory could be returned (by accident) with the correct answer.

Here it is:

For x86 at least, the return value of this function should be in eax register. Anything that was there will be considered to be the return value by the caller.
Because eax is used as return register, it is often used as "scratch" register by calee, because it does not need to be preserved. This means that it's very possible that it will be used as any of local variables. Because both of them are equal at the end, it's more probable that the correct value will be left in eax.

Check here a similar topic.

Community
  • 1
  • 1
Ziezi
  • 6,375
  • 3
  • 39
  • 49
  • What do you mean "by accident"? I mean, there must be a reason. – nbro Sep 11 '15 at 00:55
  • @Ax if the result is stored at specific address on the stack and then loaded just because the stack pointer runs through it by default and accidentaly collects the correct answer (just before hitting the return address, exiting the function and returning control to the main ) – Ziezi Sep 11 '15 at 00:59