87

What I understand is that this shouldn't be done, but I believe I've seen examples that do something like this (note code is not necessarily syntactically correct but the idea is there)

typedef struct{
    int a,b;
}mystruct;

And then here's a function

mystruct func(int c, int d){
    mystruct retval;
    retval.a = c;
    retval.b = d;
    return retval;
}

I understood that we should always return a pointer to a malloc'ed struct if we want to do something like this, but I'm positive I've seen examples that do something like this. Is this correct? Personally I always either return a pointer to a malloc'ed struct or just do a pass by reference to the function and modify the values there. (Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten).

Let's add a second part to the question: Does this vary by compiler? If it does, then what is the behavior for the latest versions of compilers for desktops: gcc, g++ and Visual Studio?

Thoughts on the matter?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
jzepeda
  • 1,470
  • 2
  • 15
  • 22
  • 35
    "What I understand is that this shouldnt be done" says who? I am doing it all the time. Also note that the typedef is not necessary in C++, and that there exists no such thing as "C/C++". – PlasmaHH Mar 06 '12 at 19:53
  • 4
    The question seems to **not** be targeted at c++. – Captain Giraffe Mar 06 '12 at 19:55
  • 5
    @PlasmaHH Copying large structures around can be inefficient. That's why one should be careful and think hard before returning a structure by value, especially if the structure has an expensive copy constructor and the compiler is not good at return value optimization. I recently made an optimization to an app that was spending a significant chunk of its time in copy constructors for a few large structures that one programmer had decided to return by value everywhere. The inefficiency was costing us about $800,000 in additional datacenter hardware we needed to buy. – Crashworks Mar 06 '12 at 19:58
  • 9
    @Crashworks: Congratulations, I hope your boss gave you a raise. – PlasmaHH Mar 06 '12 at 19:59
  • @PlasmaHH It was an obvious optimization; really I didn't save us the money so much as Larry cost us the money. The point is that return-by-value is the sort of inefficiency that *is* significant. – Crashworks Mar 06 '12 at 20:01
  • 6
    @Crashworks: sure it's bad to _always_ return by value without thinking, but in situations where it's the natural thing there is typically no safe alternative that does not also require a copy to be made, so returning by value is the best solution as it does not need any heap allocation. Often there won't even _be_ a copy, using a good compiler copy elision should jump in when it's possible and in C++11, move semantics can eliminate even more of deep-copying. Both mechanisms won't work properly if you do anything _else_ but return by value. – leftaroundabout Mar 06 '12 at 20:48
  • Incorrect. Languages that are called C/C++ **do** exist. – Johannes Schaub - litb May 31 '13 at 08:32
  • This question [Why doesn't C Code Return a Struct](http://stackoverflow.com/questions/8728790/why-doesnt-c-code-return-a-struct) has some thoughts on the returning a struct question as does this question [Return a Struct From a Function in C](http://stackoverflow.com/questions/9653072/return-a-struct-from-a-function-in-c) – Richard Chambers Sep 27 '14 at 12:59
  • "I undestood that we should always return a pointer to a malloc'ed struct" — NOPE NOPE NOPE NOPE NOPE. that's simply wrong. If there's no need for a pointer, you shouldn't use one "just because"… – The Paramagnetic Croissant May 16 '15 at 07:17

12 Answers12

86

It's perfectly safe, and it's not wrong to do so. Also: it does not vary by compiler.

Usually, when (like your example) your struct is not too big I would argue that this approach is even better than returning a malloc'ed structure (malloc is an expensive operation).

Mankarse
  • 39,818
  • 11
  • 97
  • 141
Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
  • 3
    Would it still be safe if one of the fields was a char*? Now there would be pointer in the struct – jzepeda Mar 06 '12 at 19:58
  • Yes, it would. Just be careful when you malloc/free that `char*`. – Pablo Santa Cruz Mar 06 '12 at 19:59
  • 3
    @user963258 actually, that depends on how you implement the copy constructor and destructor. – Luchian Grigore Mar 06 '12 at 20:02
  • 2
    @PabloSantaCruz That is a tricky question. If it was an exam question the examiner might well expect a "no" as a response, if ownership needs to be considered. – Captain Giraffe Mar 06 '12 at 20:03
  • 2
    @CaptainGiraffe: true. Since OP didn't clarified this, and his/her examples were basically **C**, I assumed that it was more a **C** question than a **C++** one. – Pablo Santa Cruz Mar 06 '12 at 20:05
  • @CaptainGiraffe if I was asked this in an interview, I'd definetely say that it depends on how the two are implemented, not go for a straight "Yes". – Luchian Grigore Mar 06 '12 at 20:05
  • But of course it varies by compiler! Some compilers have NRVO, some don't. And if a compiler doesn't (and by any reason you can't trash it to make place for a more recent one), then you might want to consider returning the struct by pointer "out" parameter instead. – Kos Mar 06 '12 at 20:27
  • 2
    @Kos Some compilers don't have NRVO? From what year? Also to note: In C++11 even if it doesn't have NRVO it will invoke move semantics instead. – David Mar 06 '12 at 20:31
  • An original K&C "compliant" compiler doesn't support returning structs that are bigger than an "int"'s size, IIRC. @user963258: The returned struct will be copied usually over the stack, meaning that if the struct is huge, you might run out of stack space (only really relevant on embedded systems, though). The data will be copied the same way as if you'd have a "&" or "*" argument and would copy the data in your function that way. That means: if the struct contains pointers, the pointers get copied, but not what the pointers point to, of course. – Thomas Tempelmann Mar 06 '12 at 20:56
  • Actually, it *does* vary by compiler, GCC requires an extension (under x86-32) to return structs the same as MSVC does, due to differences in the ABI used, see: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36834 – Necrolis Mar 07 '12 at 07:03
  • Would still be safe to return a vector of these structs ? – Steve Feb 17 '15 at 11:05
  • Safe, sure, but with the ability to return structs, I can do retarded stuff like this: `struct func { struct func (*fn)(); }; ... struct func why(){ return (struct func){why}; } ... struct func wow = why().fn().fn().fn().fn().fn().fn().fn...;`, which is way more amusing than it should be. Not even a peep from `-Wall -pedantic`. – Braden Best Aug 09 '16 at 02:54
75

It's perfectly safe.

You're returning by value. What would lead to undefined behavior is if you were returning by reference.

//safe
mystruct func(int c, int d){
    mystruct retval;
    retval.a = c;
    retval.b = d;
    return retval;
}

//undefined behavior
mystruct& func(int c, int d){
    mystruct retval;
    retval.a = c;
    retval.b = d;
    return retval;
}

The behavior of your snippet is perfectly valid and defined. It doesn't vary by compiler. It's ok!

Personally I always either return a pointer to a malloc'ed struct

You shouldn't. You should avoid dynamically allocated memory when possible.

or just do a pass by reference to the function and modify the values there.

This option is perfectly valid. It's a matter of choice. In general, you do this if you want to return something else from the function, while modifying the original struct.

Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten

This is wrong. I meant, it's sort of correct, but you return a copy of the structure you create inside the function. Theoretically. In practice, RVO can and probably will occur. Read up on return value optimization. This means that although retval appears to go out of scope when the function ends, it might actually be built in the calling context, to prevent the extra copy. This is an optimization the compiler is free to implement.

Luchian Grigore
  • 253,575
  • 64
  • 457
  • 625
  • 5
    +1 for mentioning RVO. This important optimization actually makes this pattern feasible for objects with expensive copy constructors, like STL containers. – Kos Mar 06 '12 at 20:28
  • 1
    It's worth mentioning that although the compiler is free to perform return value optimization, there's no guarantee it will. This is not something you can count on, only hope. – Watcom Jan 03 '13 at 13:49
  • 1
    -1 for "avoiding dynamically allocated memory when possible.” This tends to be a newb rule and frequently results in code where **LARGE** amounts of data is returned (and they puzzle why things run slowly) when a simple pointer can save a lot of time. The *correct* rule is return structures or pointers based on speed, usage, and *clarity*. – Lloyd Sargent Feb 01 '20 at 20:04
11

The lifetime of the mystruct object in your function does indeed end when you leave the function. However, you are passing the object by value in the return statement. This means that the object is copied out of the function into the calling function. The original object is gone, but the copy lives on.

Joseph Mansfield
  • 108,238
  • 20
  • 242
  • 324
10

Not only it is safe to return a struct in C (or a class in C++, where struct-s are actually class-es with default public: members), but a lot of software is doing that.

Of course, when returning a class in C++, the language specifies that some destructor or moving constructor would be called, but there are many cases where this could be optimized by the compiler.

In addition, the Linux x86-64 ABI specifies that returning a struct with two scalar (e.g. pointers, or long) values is done thru registers (%rax & %rdx) so is very fast and efficient. So for that particular case it is probably faster to return such a two-scalar fields struct than to do anything else (e.g. storing them into a pointer passed as argument).

Returning such a two-scalar field struct is then a lot faster than malloc-ing it and returning a pointer.

Victor Gama
  • 35
  • 1
  • 5
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
5

It's perfectly legal, but with large structs there are two factors that need to be taken into consideration: speed and stack size.

ebutusov
  • 563
  • 2
  • 5
4

The safety depends on how the struct itself was implemented. I just stumbled on this question while implementing something similar, and here is the potential problem.

The compiler, when returning the value does a few operations (among possibly others):

  1. Calls the copy constructor mystruct(const mystruct&) (this is a temporary variable outside the function func allocated by the compiler itself)
  2. calls the destructor ~mystruct on the variable that was allocated inside func
  3. calls mystruct::operator= if the returned value is assigned to something else with =
  4. calls the destructor ~mystruct on the temporary variable used by the compiler

Now, if mystruct is as simple as that described here all is fine, but if it has pointer (like char*) or more complicated memory management, then it all depends on how mystruct::operator=, mystruct(const mystruct&), and ~mystruct are implemented. Therefore, I suggest cautions when returning complex data structures as value.

Filippo
  • 41
  • 1
4

I will also agree with sftrabbit , Life indeed ends and stack area gets cleared up but the compiler is smart enough to ensure that all the data should be retrieved in registers or someother way.

A simple example for confirmation is given below.(taken from Mingw compiler assembly)

_func:
    push    ebp
    mov ebp, esp
    sub esp, 16
    mov eax, DWORD PTR [ebp+8]
    mov DWORD PTR [ebp-8], eax
    mov eax, DWORD PTR [ebp+12]
    mov DWORD PTR [ebp-4], eax
    mov eax, DWORD PTR [ebp-8]
    mov edx, DWORD PTR [ebp-4]
    leave
    ret

You can see that the value of b has been transmitted through edx. while the default eax contains value for a.

perilbrain
  • 7,961
  • 2
  • 27
  • 35
4

A structure type can be the type for the value returned by a function. It is safe because the compiler is going to create a copy of struct and return the copy not the local struct in the function.

typedef struct{
    int a,b;
}mystruct;

mystruct func(int c, int d){
    mystruct retval;
    cout << "func:" <<&retval<< endl;
    retval.a = c;
    retval.b = d;
    return retval;
}

int main()
{
    cout << "main:" <<&(func(1,2))<< endl;


    system("pause");
}
haberdar
  • 501
  • 5
  • 6
4

It is perfectly safe to return a struct as you have done.

Based on this statement however: Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten, I would imagine only a scenario where any of the members of the structure was dynamically allocated (malloc'ed or new'ed), in which case, without RVO, the dynamically allocated members will be destroyed and the returned copy will have a member pointing to garbage.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
maress
  • 3,533
  • 1
  • 19
  • 37
  • 1
    The stack is only used temporarily for the copy operation. Usually, the stack would get reserved before the call, and the called function puts the to-be-returned data onto the stack, and then the caller pulls this data from the stack and stores it whereever it gets assigned to. So, no worries there. – Thomas Tempelmann Mar 06 '12 at 21:00
3

Note: this answer only applies to c++11 onward. There is no such thing as "C/C++", they are different languages.

No, there is no danger in returning a local object by value, and it is recommended to do so. However, I think there is an important point that is missing from all answers here. Many others have said that the struct is being either copied or directly placed using RVO. However, this is not completely correct. I will try to explain exactly which things can happen when returning a local object.

Move semantics

Since c++11, we have had rvalue references which are references to temporary objects which can be stolen from safely. As an example, std::vector has a move constructor as well as a move assignment operator. Both of these have constant complexity and simply copy the pointer to the data of the vector being moved from. I won't go into more detail about move semantics here.

Because an object created locally within a function is temporary and goes out of scope when the function returns, a returned object is never copied with c++11 onward. The move constructor is being called on the object being returned (or not, explained later). This means that if you were to return an object with an expensive copy constructor but inexpensive move constructor, like a big vector, only the ownership of the data is transferred from the local object to the returned object - which is cheap.

Note that in your specific example, there is no difference between copying and moving the object. The default move and copy constructors of your struct result in the same operations; copying two integers. However, this is at least as fast than any other solution because the whole struct fits in a 64-bit CPU register (correct me if I'm wrong, I don't know much CPU registers).

RVO and NRVO

RVO means Return Value Optimization and is one of the very few optimizations that compilers do which can have side effects. Since c++17, RVO is required. When returning an unnamed object, it is constructed directly in-place where the caller assigns the returned value. Neither the copy constructor nor the move constructor is called. Without RVO, the unnamed object would be first constructed locally, then move constructed in the returned address, then the local unnamed object is destructed.

Example where RVO is required (c++17) or likely (before c++17):

auto function(int a, int b) -> MyStruct {
    // ...
    return MyStruct{a, b};
}

NRVO means Named Return Value Optimization and is the same thing as RVO except it is done for a named object local to the called function. This is still not guaranteed by the standard (c++20) but many compilers still do it. Note that even with named local objects, they are at worst being moved when returned.

Conclusion

The only case where you should consider not returning by value is when you have a named, very large (as in its stack size) object. This is because NRVO is not yet guaranteed (as of c++20) and even moving the object would be slow. My recommendation, and the recommendation in the Cpp Core Guidelines is to always prefer returning objects by value (if multiple return values, use struct (or tuple)), where the only exception is when the object is expensive to move. In that case, use a non-const reference parameter.

It is NEVER a good idea to return a resource that has to be manually released from a function in c++. Never do that. At least use an std::unique_ptr, or make your own non-local or local struct with a destructor that releases its resource (RAII) and return an instance of that. It would then also be a good idea to define the move constructor and move assignment operator if the resource does not have its own move semantics (and delete copy constructor/assignment).

Björn Sundin
  • 691
  • 7
  • 10
2

It is not safe to return a structure. I love to do it myself, but if someone will add a copy constructor to the returned structure later, the copy constructor will be called. This might be unexpected and can break the code. This bug is very difficult to find.

I had more elaborate answer, but moderator did not like it. So, at your expence, my tip is short.

Igor Polk
  • 31
  • 2
  • “It is not safe to return a structure. […] the copy constructor will be called.” – There’s a difference between *safe* and *inefficient*. Returning a struct is definitely safe. Even then, the call to the copy ctor will most likely be elided by the compiler as the struct is created on the caller’s stack to begin with. – phg Jan 25 '18 at 13:10
2

Let's add a second part to the question: Does this vary by compiler?

Indeed it does, as I discovered to my pain: http://sourceforge.net/p/mingw-w64/mailman/message/33176880/

I was using gcc on win32 (MinGW) to call COM interfaces that returned structs. Turns out that MS does it differently to GNU and so my (gcc) program crashed with a smashed stack.

It could be that MS might have the higher ground here - but all I care about is ABI compatibility between MS and GNU for building on Windows.

If it does, then what is the behavior for the latest versions of compilers for desktops: gcc, g++ and Visual Studio

You can find some messages on a Wine mailing list about how MS seems to do it.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
effbiae
  • 1,087
  • 1
  • 7
  • 22
  • It would be more helpful if you gave a pointer to the Wine mailing list that you're referring to. – Jonathan Leffler May 16 '15 at 07:24
  • Returning structs is fine. COM specifies a binary interface ; if someone doesn't implement COM properly then that would be a bug. – M.M Nov 13 '15 at 08:17