Allocation of Return Values of User Defined Functions: Performance?

Question

In considering a scenario where we may not wish to always store the return value of a function in multiple calls, take this code:

  int foo()
     {
        return 1;
     }

which is called from the following procedure.

  void main()
  {
    foo();
  }

What happens to the return value of foo? In not finding a variable address does the compiler put it on a spare register- or does it get wiped when the next statement is executed? If now instead the return value is stored somewhere:

  void main()
  {
    int retval;
    retval = foo();
  }

What's the performance difference between the two scenarios? Is not storing retval somewhere going to cause any issues with the stack management somewhere down the track?

Edit: Own config: Toolset: V100, __cdecl (/Gd): Running Win32 in VS10 compiling as C++ - would inlining come into play?
Can we but guess as to how many many specific configurations apply to a given set of answers. Might be enough for a small book!
As to performance, int was a poor choice, perhaps float or double makes for more impact? This question differs from the proposed duplicate like this: Given the return address of foo is added to a call frame, one scenario is that the address of retval is placed below it, but if retval is NULL there is an extra overhead?

The answer to your question is architecture dependent. Many architectures *always* store return values in register(s). In your example `retval` is likely to be stored only in a register and not on the stack at all. — kaylum, Dec 24 '16 at 06:53
There is no address in your code. C and C++ are different languages. Pick the one you actually use! — too honest for this site, Dec 24 '16 at 07:06
While the proper answer is compiler and architecture dependent, you can expect that the return value that fits into the register will be passed via the register. You're free to store it on stack later on in the caller code. -O3 will optimize away all your code in both cases. — Ap31, Dec 24 '16 at 07:17
@Ap31: There is neither a requirement to use registers, nor a stack in the C standard. Actually register-less architectures exist and implementations which don't use a stack. And a specific optimisation option for a specific implementation is completely unrelated. — too honest for this site, Dec 24 '16 at 07:33
@Olaf you're absolutely right of course, I was trying to be realistic — Ap31, Dec 24 '16 at 07:42
@Ap31: In what terms? I work (and have worked) with various architectures, including some CPUs you will not find public information about. For me and my colleagues, not assuming something not stated is **very** realistic. You'd be surprised how many errors result from false assumptions without basis. — too honest for this site, Dec 24 '16 at 08:32

score 4 · Accepted Answer · edited May 23 '17 at 11:46

So, this is very dependent on exactly what's being returned. For a datatype like int most implementations will stuff the return value into a register, and the optimizer in the calling function will treat the register as 'dirty' and otherwise ignore it unless it needs to repurpose it for something else. Basically, there's no performance difference at all.

If you are returning a more complex value that has constructors and destructors and such, the compiler will have to call them unless the function being called is inline and the optimizer can determine that the constructor and destructor have no effect. Even when they do have an effect, there are specific cases in which it's permissible for the compiler to elide constructor and destructor calls for return values. C++11 with it's support of the move constructor has removed some of the need for this. Anyway, this is addressed here: What are copy elision and return value optimization?

For complex return values that are too big to fit into registers the implementations I've seen have the caller make some space for the return value on the stack and pass in a pointer to that return value into the called function. This means that if the caller can determine that the destructor has no effect it will simply omit a call to it and treat that section of memory as 'dirty' for optimization purposes.

score 2 · Answer 2 · answered Dec 24 '16 at 06:54

2

If you don't assign it, it is lost. It depends on the calling convention but typically the return value is passed in a register (%rax, e.g. in SysV x86_64) and if you don't assign it, the register is simply reused which will obviously overwrite the value.

In the 2nd case it will put it on the stack, but the optimizer will most probably remove the code since it is useless. But this depends on the compiler and the optimizer.

answered Dec 24 '16 at 06:54

Bernhard

354
1
6

1

However, the optimizer will still remove it, independently where it is stored. – Bernhard Dec 24 '16 at 07:17
3

@Olaf - In discussing the performance of language features, talking about implementation specific details is inevitable. So, talking about optimizers not being required, and not even mentioned in the standard and so on is irrelevant to the question being asked. – Omnifarious Dec 24 '16 at 07:59
1

" It makes no sense to refer how a specific implementation might work" - when you are talking about performance, you *must* refer to how a specific implementation works. – Martin Bonner supports Monica Dec 24 '16 at 08:05
So you should read the question again, because it already contains implementation-specific details "... does the compiler put it on a spare register- or does it get wiped when the next statement is executed?..." – Bernhard Dec 24 '16 at 08:06
1

@Omnifarious: OP does not provide information about the architecture he uses, nor the compiler or other information. All we can do is speculate. But speaking about implementations: on machines where results are passed by registers, there is nothing to "remove". It is just ignored. Still that has nothing to do with an optimiser, but will be done even for the simplest translation with fixed patterns. – too honest for this site Dec 24 '16 at 08:27
@MartinBonner: I fully agree. However, there is one _slight_ problem: Which "specific implementation" is the question about? – too honest for this site Dec 24 '16 at 08:28
2

@Bernhard: Reading and understanding the whole question should make clear OP is just wildly guessing. Easiest thing would be to read the machine code of hi specific implementation. – too honest for this site Dec 24 '16 at 08:34
Rather than being pedantic, perhaps it would be better to simply ask what architecture and compiler, et cetera are being used and explain that the answer varies on choice of each. – Zéychin Dec 24 '16 at 08:41

Allocation of Return Values of User Defined Functions: Performance?

2 Answers2