63

Working on my C muscle lately and looking through the many libraries I've been working with its certainly gave me a good idea of what is good practice. One thing that I have NOT seen is a function that returns a struct:

something_t make_something() { ... }

From what I've absorbed this is the "right" way of doing this:

something_t *make_something() { ... }
void destroy_something(something_t *object) { ... }

The architecture in code snippet 2 is FAR more popular than snippet 1. So now I ask, why would I ever return a struct directly, as in snippet 1? What differences should I take into account when I'm choosing between the two options?

Furthermore, how does this option compare?

void make_something(something_t *object)
Dellowar
  • 3,160
  • 1
  • 18
  • 37
  • 16
    The important difference I see is copying vs not and heap vs stack. – Iharob Al Asimi Oct 21 '16 at 02:59
  • 13
    Don't tag both C and C++. The answers of the same question for the two languages are very different. Pick one. –  Oct 21 '16 at 03:29
  • 4
    I've made some edits to try and save this question from being flagged as opinion-based. "Best practices" can be a very blurry line, and the idea that one option is better than the other can be a subjective one. Let me know if the edited version of the question is too far off the mark for what you want to know. – Dietrich Epp Oct 21 '16 at 04:16
  • @DietrichEpp Thanks! – Dellowar Oct 21 '16 at 06:16
  • 2
    If the structure is too large to be returned like a normal return value (e.g., in a register), the vast majority of ABIs require compilers to transform the first form into the second form, effectively passing a hidden pointer that the `make_something` function will fill. As such, the two forms are basically identical from an object code perspective, the only difference is what you want your API to look like for the client. And for that reason, I would choose form #1 the vast majority of the time, because it is so much simpler. Let the compiler do the dirty work of passing pointers. – Cody Gray - on strike Oct 21 '16 at 10:50
  • 2
    @CodyGray: Except... ABI Compatibility can be achieved by using opaque types (Lundin's answer), which requires passing by pointers. And when it matters, it really matters. – Matthieu M. Oct 21 '16 at 11:56
  • @iharob: Copying is a non-issue; stack vs heap is part of the argument. – Matthieu M. Oct 21 '16 at 12:46
  • 1
    @CodyGray the question is now tagged `C`, and C has no ABI and quite no object code perspective – edc65 Oct 21 '16 at 15:38
  • 1
    There is a third pattern, `int make_something(something_t *object, int sizeof) { ... }` where the caller preallocates the struct and the function fills it with data. – Agent_L Oct 21 '16 at 15:59
  • 1
    There's also the issue of writing transparent code. Anyone who has done much C programming ought to expect a function returning a pointer to something as being normal behavior, while returning a struct is something that I've never actually seen outside of StackExchange questions. – jamesqf Oct 21 '16 at 17:52

6 Answers6

72

When something_t is small (read: copying it is about as cheap as copying a pointer) and you want it to be stack-allocated by default:

something_t make_something(void);

something_t stack_thing = make_something();

something_t *heap_thing = malloc(sizeof *heap_thing);
*heap_thing = make_something();

When something_t is large or you want it to be heap-allocated:

something_t *make_something(void);

something_t *heap_thing = make_something();

Regardless of the size of something_t, and if you don’t care where it’s allocated:

void make_something(something_t *);

something_t stack_thing;
make_something(&stack_thing);

something_t *heap_thing = malloc(sizeof *heap_thing);
make_something(heap_thing);
Jon Purdy
  • 53,300
  • 8
  • 96
  • 166
  • 6
    There's alot of good answers in this thread. This one is effectively explains the most with the fewest words. – Dellowar Oct 21 '16 at 06:14
  • 1
    Besides the correct considerations about the size of the object, the latter example is a good advise for ease of use and simplicity. Maybe you can return an `int` with some error code for the execution. – EnzoR Oct 21 '16 at 07:01
  • 14
    Almost too few words. – Lightness Races in Orbit Oct 21 '16 at 09:01
  • You should have a look at [Lundin's answer](http://stackoverflow.com/a/40170171/147192): you are omitting a very important reason for C libraries to use **opaque types**, and the probably one reason which explains the prevalence of style 2 that the OP has noticed. – Matthieu M. Oct 21 '16 at 12:44
  • 2
    If you've gone for option 2, you also need to provide `free_something`. – OrangeDog Oct 21 '16 at 17:01
  • @MatthieuM.: I omitted discussion of opaque types because it’s more involved—while style 2 can enforce it, you can still provide types that are intended to be used opaquely with styles 1 or 3. SDL, for example, has structures where some fields are documented as “for internal use”—you *can* use them if you know what you’re doing, but the API provides enough functionality that you never *need* to use them in normal circumstances. – Jon Purdy Oct 21 '16 at 22:11
  • @OrangeDog: Probably yes, especially if `free_something` ever needs to do any cleanup. But you can also say “`make_something` returns a pointer that can be passed to `free`”, as some standard library functions do. – Jon Purdy Oct 21 '16 at 22:13
  • 2
    Style 3 is also useful in the opposite scenario, where you actually do care about *exactly* where an object is allocated, perhaps because its identity is important. – Alex Celeste Oct 22 '16 at 00:15
  • 1
    Style 3 seems the most C-like to me, but I think most C libraries would have `make_something` return an error code as well. – Austin Mullins Oct 25 '16 at 20:07
38

This is almost always about ABI stability. Binary stability between versions of the library. In the cases where it is not, it is sometimes about having dynamically sized structs. Rarely it is about extremely large structs or performance.


It is exceedingly rare that allocating a struct on the heap and returning it is nearly as fast as returning it by-value. The struct would have to be huge.

Really, speed is not the reason behind technique 2, return-by-pointer, instead of return-by-value.

Technique 2 exists for ABI stability. If you have a struct and your next version of the library adds another 20 fields to it, consumers of your previous version of the library are binary compatible if they are handed pre-constructed pointers. The extra data beyond the end of the struct they know about is something they don't have to know about.

If you return it on the stack, the caller is allocating the memory for it, and they must agree with you on how big it is. If your library updated since they last rebuilt, you are going to trash the stack.

Technique 2 also permits you to hide extra data both before and after the pointer you return (which versions appending data to the end of the struct is a variant of). You could end the structure with a variable sized array, or prepend the pointer with some extra data, or both.

If you want stack-allocated structs in a stable ABI, almost all functions that talk to the struct need to be passed version information.

So

something_t make_something(unsigned library_version) { ... }

where library_version is used by the library to determine what version of something_t it is expected to return and it changes how much of the stack it manipulates. This isn't possible using standard C, but

void make_something(something_t* here) { ... }

is. In this case, something_t might have a version field as its first element (or a size field), and you would require that it be populated prior to calling make_something.

Other library code taking a something_t would then query the version field to determine what version of something_t they are working with.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
15

As a rule of thumb, you should never pass struct objects by value. In practice, it will be fine to do so as long as they are smaller or equal to the maximum size that your CPU can handle in a single instruction. But stylistically, one typically avoids it even then. If you never pass structs by value you can later on add members to the struct and it won't affect performance.

I think that void make_something(something_t *object) is the most common way to use structures in C. You leave the allocation to the caller. It is efficient but not pretty.

However, object-oriented C programs use something_t *make_something() since they are built with the concept of opaque type, which forces you to use pointers. Whether the returned pointer points at dynamic memory or something else depends on the implementation. OO with opaque type is often one of the most elegant and best ways to design more complex C programs, but sadly, few C programmers know/care about it.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 2
    The **one** answer touching on **opaque types**, which are a cornerstone of ABI stability. Thank you, sir. – Matthieu M. Oct 21 '16 at 11:54
  • 5
    -1 for "smaller or equal to the maximum size that your CPU can handle in a single instruction". Malloc takes a lot more than a single instruction. There's no exact method for determining how big a struct must be before it makes sense to pass it by reference (since that depends on how it's used) instead of heap allocate it, but in practice it's quite a bit larger than sizeof(void*) for most use cases. For example, most games will pass 4x4 matrices by value. – Robert Fraser Oct 21 '16 at 17:51
  • @Lundin When you say "object-oriented" are you referring to the programming paradigm that became popular after C had become established, or something else? I don't mean to suggest that it's impossible to use OOP in C, I just want to make sure I'm thinking of the right concept. – Gordon Gustafson Oct 25 '16 at 23:52
  • 1
    @GordonGustafson Object-orientation is widely acknowledged as a good way to design programs properly. The choice of language doesn't matter. OO consists of 3 things: private encapsulation of implementation and data (somewhat important), modular programming where each class is autonomous and only concerned with its own designated purpose (very important) and inheritance with/without polymorphism (might occasionally be useful). Even ancient C programs written by good programmers used object-orientation of a kind, although the classes were then called things like "ADT". – Lundin Oct 26 '16 at 15:29
  • 1
    The only way to achieve pure private encapsulation in C, is through opaque type (half-arsed versions are possible with the use of `static` data, but that doesn't allow multiple instances of the class, nor is it thread safe). Opaque type can also be used to achieve inheritance and polymorphism. – Lundin Oct 26 '16 at 15:31
10

Some pros of the first approach:

  • Less code to write.
  • More idiomatic for the use case of returning multiple values.
  • Works on systems that don't have dynamic allocation.
  • Probably faster for small or smallish objects.
  • No memory leak due to forgetting to free.

Some cons:

  • If the object is large (say, a megabyte) , may cause stack overflow, or may be slow if compilers don't optimize it well.
  • May surprise people who learned C in the 1970s when this was not possible, and haven't kept up to date.
  • Does not work with objects that contain a pointer to a part of themself.
M.M
  • 138,810
  • 21
  • 208
  • 365
  • 8
    "May surprise people who learned C in the 1970s when this was not possible, and haven't kept up to date." Isn't that a good thing? :) – Lundin Oct 21 '16 at 06:55
  • 1
    "No memory leak due to forgetting to free" - the memory leak is even easier to make, simply add some return/break/goto statements somewhere between the make and destroy. – Marian Spanik Oct 21 '16 at 15:20
4

I'm somewhat surprised.

The difference is that example 1 creates a structure on the stack, example 2 creates it on the heap. In C, or C++ code which is effectively C, it's idiomatic and convenient to create most objects on the heap. In C++ it is not, mostly they go on the stack. The reason is that if you create an object on the stack, the destructor is called automatically, if you create it on the heap, it must be called explicitly.So it's a lot easier to ensure there are no memory leaks and to handle exceptions is everything goes on the stack. In C, the destructor must be called explictly anyway, and there's no concept of a special destructor function (you have destructors, of course, but they are just normal functions with names like destroy_myobject()).

Now the exception in C++ is for low-level container objects, e.g. vectors, trees, hash maps and so on. These do retain heap members, and they have destructors. Now most memory-heavy objects consist of a few immediate data members giving sizes, ids, tags and so on, and then the rest of the information in STL structures, maybe a vector of pixel data or a map of English word / value pairs. So most of the data is in fact on the heap, even in C++.

And modern C++ is designed so that this pattern

class big
{
    std::vector<double> observations; // thousands of observations
    int station_x;                    // a bit of data associated with them
    int station_y; 
    std::string station_name; 
}  

big retrieveobservations(int a, int b, int c)
{
    big answer;
    //  lots of code to fill in the structure here

    return answer;
}

void high_level()
{
   big myobservations = retriveobservations(1, 2, 3);
}

Will compile to pretty efficient code. The large observation member won't generate unnecessary makework copies.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • 11
    To say that C++ uses the heap less than C is just silly. First of all, RAII doesn't necessarily mean that the local class doesn't use the heap - it just means that it won't easily leak memory. If it didn't use the heap, then why need the destructor? "Rule of 3". Pretty much every C++ standard library container uses the heap, including std::string. One main difference between C and C++ is that in C, you only use the heap when you need to, while in C++ you often end up using it without knowing. Which is actually one of the main reasons why C++ is frowned at for embedded systems development. – Lundin Oct 21 '16 at 06:43
3

Unlike some other languages (like Python), C does not have the concept of a tuple. For example, the following is legal in Python:

def foo():
    return 1,2

x,y = foo()
print x, y

The function foo returns two values as a tuple, which are assigned to x and y.

Since C doesn't have the concept of a tuple, it's inconvenient to return multiple values from a function. One way around this is to define a structure to hold the values, and then return the structure, like this:

typedef struct { int x, y; } stPoint;

stPoint foo( void )
{
    stPoint point = { 1, 2 };
    return point;
}

int main( void )
{
    stPoint point = foo();
    printf( "%d %d\n", point.x, point.y );
}

This is but one example where you might see a function return a structure.

user3386109
  • 34,287
  • 7
  • 49
  • 68
  • Okay this is good, but between all the differences of the return types, is it up always up to preference? – Dellowar Oct 21 '16 at 04:06
  • 3
    This doesn't answer the question. Returning multiple values can also be achieved by returning a pointer to a struct with multiple members. The question takes for granted why a struct is returned and moves past that to ask what factors inform the choice between returning it by value or by pointer. It's not about why we would return a struct in the first place at all. – underscore_d Oct 21 '16 at 18:48
  • @user3386109 The "inconvenience" of returning a struct on the stack is not real. As explained in a better answer, it depends on the size of the struct. Small ones can be easily "copied" into and out of the stack without the pointer argument, while larger ones, besides requiring more stack room, could require more "work" to pop them out. – EnzoR Oct 27 '16 at 16:30