189

Are there any downsides to passing structs by value in C, rather than passing a pointer?

If the struct is large, there is obviously the performance aspect of copying lots of data, but for a smaller struct, it should basically be the same as passing several values to a function.

It is maybe even more interesting when used as return values. C only has single return values from functions, but you often need several. So a simple solution is to put them in a struct and return that.

Are there any reasons for or against this?

Since it might not be obvious to everyone what I'm talking about here, I'll give a simple example.

If you're programming in C, you'll sooner or later start writing functions that look like this:

void examine_data(const char *ptr, size_t len)
{
    ...
}

char *p = ...;
size_t l = ...;
examine_data(p, l);

This isn't a problem. The only issue is that you have to agree with your coworker in which the order the parameters should be so you use the same convention in all functions.

But what happens when you want to return the same kind of information? You typically get something like this:

char *get_data(size_t *len);
{
    ...
    *len = ...datalen...;
    return ...data...;
}
size_t len;
char *p = get_data(&len);

This works fine, but is much more problematic. A return value is a return value, except that in this implementation it isn't. There is no way to tell from the above that the function get_data isn't allowed to look at what len points to. And there is nothing that makes the compiler check that a value is actually returned through that pointer. So next month, when someone else modifies the code without understanding it properly (because he didn't read the documentation?) it gets broken without anyone noticing, or it starts crashing randomly.

So, the solution I propose is the simple struct

struct blob { char *ptr; size_t len; }

The examples can be rewritten like this:

void examine_data(const struct blob data)
{
    ... use data.tr and data.len ...
}

struct blob = { .ptr = ..., .len = ... };
examine_data(blob);

struct blob get_data(void);
{
    ...
    return (struct blob){ .ptr = ...data..., .len = ...len... };
}
struct blob data = get_data();

For some reason, I think that most people would instinctively make examine_data take a pointer to a struct blob, but I don't see why. It still gets a pointer and an integer, it's just much clearer that they go together. And in the get_data case it is impossible to mess up in the way I described before, since there is no input value for the length, and there must be a returned length.

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
dkagedal
  • 578
  • 2
  • 7
  • 14
  • For what it's worth, `void examine data(const struct blob)` is incorrect. – Chris Lutz Sep 26 '11 at 04:37
  • Thanks, changed it to include a variable name. – dkagedal Sep 27 '11 at 11:47
  • 2
    "There is no way to tell from the above that the function get_data isn't allowed to look at what len points to. And there is nothing that makes the compiler check that a value is actually returned through that pointer." - this makes no sense to me at all (perhaps because your example is invalid code due to the last two lines appearing outside a function); please can you elaborate? – Adam Spiers Apr 11 '13 at 10:36
  • 4
    The two lines below the function are there to illustrate how the function is called. The function signature gives no hint to the fact that the implementation should will only write to the pointer. And the compiler have no way of knowing that it should verify that a value is written to the pointer, so the return value mechanism can only be described in documentation. – dkagedal Jul 28 '13 at 18:59
  • 3
    The major reason people don't do this more often in C is historical. Prior to C89, you _couldn't_ pass or return structs by value, so all the system interfaces that predate C89 and logically ought to do it (like `gettimeofday`) use pointers instead, and people take that as an example. – zwol Dec 10 '17 at 02:42
  • Does the struct passed by value get copied out to memory in the C struct format? I need to pass something like a string class by value, it's got a cursor and end of buffer pointer. I can't have any extra instructions. Lets say I had 3 char* in a function, those would get put directly into registers and not saved to RAM unless we used the hardware assisted stack, in which case the registers would get saved to the stack NOT in C struct format. Would a struct with 3 char* be treated the same as 3 char* created "on the stack"? I would assume the struct wouldn't be written to RAM until instructed. –  Apr 10 '18 at 18:11

11 Answers11

239

For small structs (eg point, rect) passing by value is perfectly acceptable. But, apart from speed, there is one other reason why you should be careful passing/returning large structs by value: Stack space.

A lot of C programming is for embedded systems, where memory is at a premium, and stack sizes may be measured in KB or even Bytes... If you're passing or returning structs by value, copies of those structs will get placed on the stack, potentially causing the situation that this site is named after...

If I see an application that seems to have excessive stack usage, structs passed by value is one of the things I look for first.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Roddy
  • 66,617
  • 42
  • 165
  • 277
  • 5
    "If you're passing or returning structs by value, copies of those structs will get placed on the stack" I'd call *braindead* any toolchain that does so. Yes, it's sad that so many will do it, but it's not anything that the C standard calls for. A sane compiler will optimize it all out. – Kuba hasn't forgotten Monica Feb 01 '15 at 18:05
  • 6
    @KubaOber This is why that doesn't get done often: http://stackoverflow.com/questions/552134/why-isnt-pass-struct-by-reference-a-common-optimization – Roddy Feb 01 '15 at 22:15
  • 6
    Is there a definitive line that separates a small struct from a large struct? – Josie Thompson Jan 16 '19 at 19:52
  • 1
    What about performance impact by having to access struct by reference inside the function, compared to accessing it directly (without reference) if passed by value. I mean there should be performance benefits of passing relatively small structures by value. – Illya S Aug 21 '20 at 10:46
  • Nice. Now I know what "stack overflow" means. – programmerRaj Dec 23 '22 at 16:59
  • @Kubahasn'tforgottenMonica What would be the alternative? – 12431234123412341234123 May 03 '23 at 13:06
  • @IllyaS I doubt it, if you pass by value and it is placed on the stack, then accessing them is via stack pointer + offset. If you pass by pointer, the pointer is likely to be placed in a register and accessing them is via pointer inside a register. But you have to measure it on your system. – 12431234123412341234123 May 03 '23 at 13:08
72

One reason not to do this which has not been mentioned is that this can cause an issue where binary compatibility matters.

Depending on the compiler used, structures can be passed via the stack or registers depending on compiler options/implementation

See: http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

-fpcc-struct-return

-freg-struct-return

If two compilers disagree, things can blow up. Needless to say the main reasons not to do this are illustrated are stack consumption and performance reasons.

Community
  • 1
  • 1
tonylo
  • 3,311
  • 3
  • 28
  • 27
  • 5
    This was the kind of answer I was looking for. – dkagedal Oct 03 '08 at 22:13
  • 7
    True, but those options don't relate to pass-by-value. they relate to *returning* structs which is a different thing altogether. Returning things by reference is usually a sure-fire way of shooting yourself in both feet. `int &bar() { int f; int &j(f); return j;};` – Roddy Dec 08 '11 at 10:17
23

To really answer this question, one needs to dig deep into the assembly land:

(The following example uses gcc on x86_64. Anyone is welcome to add other architectures like MSVC, ARM, etc.)

Let's have our example program:

// foo.c

typedef struct
{
    double x, y;
} point;

void give_two_doubles(double * x, double * y)
{
    *x = 1.0;
    *y = 2.0;
}

point give_point()
{
    point a = {1.0, 2.0};
    return a;
}

int main()
{
    return 0;
}

Compile it with full optimizations

gcc -Wall -O3 foo.c -o foo

Look at the assembly:

objdump -d foo | vim -

This is what we get:

0000000000400480 <give_two_doubles>:
    400480: 48 ba 00 00 00 00 00    mov    $0x3ff0000000000000,%rdx
    400487: 00 f0 3f 
    40048a: 48 b8 00 00 00 00 00    mov    $0x4000000000000000,%rax
    400491: 00 00 40 
    400494: 48 89 17                mov    %rdx,(%rdi)
    400497: 48 89 06                mov    %rax,(%rsi)
    40049a: c3                      retq   
    40049b: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

00000000004004a0 <give_point>:
    4004a0: 66 0f 28 05 28 01 00    movapd 0x128(%rip),%xmm0
    4004a7: 00 
    4004a8: 66 0f 29 44 24 e8       movapd %xmm0,-0x18(%rsp)
    4004ae: f2 0f 10 05 12 01 00    movsd  0x112(%rip),%xmm0
    4004b5: 00 
    4004b6: f2 0f 10 4c 24 f0       movsd  -0x10(%rsp),%xmm1
    4004bc: c3                      retq   
    4004bd: 0f 1f 00                nopl   (%rax)

Excluding the nopl pads, give_two_doubles() has 27 bytes while give_point() has 29 bytes. On the other hand, give_point() yields one fewer instruction than give_two_doubles()

What's interesting is that we notice the compiler has been able to optimize mov into the faster SSE2 variants movapd and movsd. Furthermore, give_two_doubles() actually moves data in and out from memory, which makes things slow.

Apparently much of this may not be applicable in embedded environments (which is where the playing field for C is most of the time nowdays). I'm not an assembly wizard so any comments would be welcome!

kizzx2
  • 18,775
  • 14
  • 76
  • 83
  • 8
    Counting the number of instructions isn't all that interesting, unless you can show a huge difference, or count more interesting aspects such as the numer of hard-to-predict jumps etc. The actual performance properties is much more subtle than the instruction count. – dkagedal Aug 08 '10 at 21:45
  • 7
    @dkagedal: True. In retrospect, I think my own answer was written very poorly. Although I didn't focus on number of instructions very much (dunno what gave you that impression :P), the actual point to make was that passing struct by value is preferable to passing by reference for small types. Anyway, passing by value is preferred because it's simpler (no lifetime juggling, no need to worry about someone changing your data or `const` all the time) and I found there's not much performance penalty (if not gain) in the pass-by-value copying, contrary to what many might believe. – kizzx2 Aug 09 '10 at 05:44
18

One thing people here have forgotten to mention so far (or I overlooked it) is that structs usually have a padding!

struct {
  short a;
  char b;
  short c;
  char d;
}

Every char is 1 byte, every short is 2 bytes. How large is the struct? Nope, it's not 6 bytes. At least not on any more commonly used systems. On most systems it will be 8. The problem is, the alignment is not constant, it's system dependent, so the same struct will have different alignment and different sizes on different systems.

Not only that padding will further eat up your stack, it also adds the uncertainty of not being able to predict the padding in advance, unless you know how your system pads and then look at every single struct you have in your app and calculate the size for it. Passing a pointer takes a predictable amount of space -- there is no uncertainty. The size of a pointer is known for the system, it is always equal, regardless of what the struct looks like and pointer sizes are always chosen in a way that they are aligned and need no padding.

D.W.
  • 3,382
  • 7
  • 44
  • 110
Mecki
  • 125,244
  • 33
  • 244
  • 253
  • 2
    Yea, but the padding exists with no dependency on passing the structure by value or by reference. – Ilya Oct 02 '08 at 13:17
  • If you had tested your example, you would have found that your example struct is indeed four bytes, so your argument is moot. But it had no relevance for the question anyway. – dkagedal Oct 02 '08 at 14:01
  • 2
    @dkagedal: Which part of "different sizes on different systems" didn't you understand? Just because it is that way on your system, you assume it must be the same for any other one - that's exactly why you should not pass by value. Changed sample so it fails on your system as well. – Mecki Oct 02 '08 at 14:19
  • @llya: Yeah, but padding consumes stack space for what? Right, nothing. And without knowing the padding for any single struct, there is no way to predict how much stack space a call to you function "costs", while it is absolutely clear when you pass a pointer. – Mecki Oct 02 '08 at 14:22
  • 2
    I think Mecki's comments about struct padding are relevant especially for embedded systems where stack size may be an issue. – zooropa Apr 29 '09 at 12:55
  • 1
    I guess the flip side of the argument is that if your struct is a simple struct (containing a couple of primitive types), passing by value will enable the compiler to juggle it using registers -- whereas if you use pointers, things end up in the memory, which is slower. That gets pretty low-level and pretty much depends on your target architecture, if any of these tidbits matter. – kizzx2 Jul 29 '10 at 02:51
  • 1
    Unless your struct is tiny or your CPU has many registers (and Intel CPUs have not), the data ends up on the stack and that is also memory and as fast/slow as any other memory. A pointer on the other hand is always small and just a pointer and the pointer itself will usually always end up in a register when used more often. – Mecki Jul 29 '10 at 12:45
  • I don't understand what "insecurity" you are referring to in the last paragraph of your answer. Can you edit the question to elaborate? Do you think there is a security risk? If so, what specifically is the risk that you anticipate? – D.W. Jun 20 '17 at 20:30
  • @D.W. Sorry, that was probably not my best English. It's not insecure as in "security flaw", but insecure as in "you cannot know for sure" :-) At compile time you can use `sizeof()` or you can search for documentation on a given system. But for a system you have never even heard and for that you cannot find any documentation, can you tell me how big that struct above is in memory? If I tell you that pointers on that system are 48 bit (yeah, it has a 48 bit CPU - why not?), can you? But knowing that pointers are 48 bits, you can tell me exactly how big `void * ptrs[30]` is, correct? – Mecki Jun 20 '17 at 21:17
  • @Mecki, thanks for the clarification! I've suggested an edit to try to incorporate that information into your answer. Appreciate it! – D.W. Jun 20 '17 at 21:52
  • Regarding the security. So what if the same struct declaration will be of different size on different systems? If you use `sizeof(my_struct)` everywhere, then specific compiler for each system will compile the code to work for that system. The only danger I see here is if you try to serialise struct into a byte buffer and then send the buffer over the network. But for that just declare members to have a fixed size like `uint32_t`, `uint8_t` etc. (instead of int, char). So what am I missing here? – mercury0114 Oct 09 '20 at 15:29
  • 1
    @mercury0114 Padding will also happen with fixed size types. `struct x { uint64_t a; uint32_t b; }` has a size of 16, not 12. Yet it may have a size of 32 on a 128 bit system (RISV-V CPUs can be 128 bit). But it's not just different systems, the same struct can have different size on the same system in a 32 bit and in a 64 bit process, both running on the same system. – Mecki Oct 09 '20 at 16:56
  • `the same struct can have different size on the same system in a 32 bit and in a 64 bit process, both running on the same system` - so the danger is if those processes communicate and send these structs to one another? – mercury0114 Oct 09 '20 at 17:52
  • @mercury0114 If one side sends `sizeof(struct)` bytes and another side receives `sizeof(struct)` bytes and both use an equal struct definition, the struct may still not have been transferred correctly between two processes (same machine or not), sizeof may be different for both processes, as well as the struct layout. Size and layout are only guaranteed within a single process. That's why network code always works with packed structs (a C extension), as their size and layout is guaranteed if all fields have a fixed size type. – Mecki Oct 09 '20 at 19:06
15

Simple solution will be return an error code as a return value and everything else as a parameter in the function,
This parameter can be a struct of course but don't see any particular advantage passing this by value, just sent a pointer.
Passing structure by value is dangerous, you need to be very careful what are you passing are, remember there is no copy constructor in C, if one of structure parameters is a pointer the pointer value will be copied it might be very confusing and hard to maintain.

Just to complete the answer (full credit to Roddy ) the stack usage is another reason not pass structure by value, believe me debugging stack overflow is real PITA.

Replay to comment:

Passing struct by pointer meaning that some entity has an ownership on this object and have a full knowledge of what and when should be released. Passing struct by value create a hidden references to the internal data of struct (pointers to another structures etc .. ) at this is hard to maintain (possible but why ?) .

Community
  • 1
  • 1
Ilya
  • 3,104
  • 3
  • 23
  • 30
  • 7
    But passing a pointer isn't more "dangerous" just because you put it in a struct, so I don't buy it. – dkagedal Oct 02 '08 at 11:47
  • 1
    Great point on copying a structure that contains a pointer. This point may not be very obvious. For those who don't know what he is referring to, do a search on deep copy vs shallow copy. – zooropa Apr 29 '09 at 12:21
  • 1
    One of the C function conventions is to have output parameters be listed first before input parameters, e.g. int func(char* out, char *in); – zooropa Apr 29 '09 at 12:29
  • 1
    You mean like how for example getaddrinfo() puts the output parameter last? :-) There are a thousand set of conventions, and you can choose whichever you want. – dkagedal Jul 28 '13 at 19:08
11

Here's something no one mentioned:

void examine_data(const char *c, size_t l)
{
    c[0] = 'l'; // compiler error
}

void examine_data(const struct blob blob)
{
    blob.ptr[0] = 'l'; // perfectly legal, quite likely to blow up at runtime
}

Members of a const struct are const, but if that member is a pointer (like char *), it becomes char *const rather than the const char * we really want. Of course, we could assume that the const is documentation of intent, and that anyone who violates this is writing bad code (which they are), but that's not good enough for some (especially those who just spent four hours tracking down the cause of a crash).

The alternative might be to make a struct const_blob { const char *c; size_t l } and use that, but that's rather messy - it gets into the same naming-scheme problem I have with typedefing pointers. Thus, most people stick to just having two parameters (or, more likely for this case, using a string library).

Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
  • Yes it's perfectly legal, and also something that you want to do sometimes. But I agree that it is a limitation of the struct solution that you cannot make the pointers they point to point to const. – dkagedal Sep 27 '11 at 11:49
  • A nasty gotcha with the `struct const_blob` solution is that even if `const_blob` has members that differ from `blob` only in "indirect-const-ness", types `struct blob*` to a `struct const_blob*` will be considered distinct for purposes of strict aliasing rule. Consequently, if code casts a `blob*` to a `const_blob*`, any subsequent write to the underlying structure using one type will silently invalidate any existing pointers of the other type, such that any use will invoke Undefined Behavior (which may usually be harmless, but could be deadly). – supercat Jun 22 '15 at 15:44
9

I'd say passing (not-too-large) structs by value, both as parameters and as return values, is a perfectly legitimate technique. One has to take care, of course, that the struct is either a POD type, or the copy semantics are well-specified.

Update: Sorry, I had my C++ thinking cap on. I recall a time when it was not legal in C to return a struct from a function, but this has probably changed since then. I would still say it's valid as long as all the compilers you expect to use support the practice.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
9

I think that your question has summed things up pretty well.

One other advantage of passing structs by value is that memory ownership is explicit. There is no wondering about if the struct is from the heap, and who has the responsibility for freeing it.

Darron
  • 21,309
  • 5
  • 49
  • 53
6

Page 150 of PC Assembly Tutorial on http://www.drpaulcarter.com/pcasm/ has a clear explanation about how C allows a function to return a struct:

C also allows a structure type to be used as the return value of a func- tion. Obviously a structure can not be returned in the EAX register. Different compilers handle this situation differently. A common solution that compilers use is to internally rewrite the function as one that takes a structure pointer as a parameter. The pointer is used to put the return value into a structure defined outside of the routine called.

I use the following C code to verify the above statement:

struct person {
    int no;
    int age;
};

struct person create() {
    struct person jingguo = { .no = 1, .age = 2};
    return jingguo;
}

int main(int argc, const char *argv[]) {
    struct person result;
    result = create();
    return 0;
}

Use "gcc -S" to generate assembly for this piece of C code:

    .file   "foo.c"
    .text
.globl create
    .type   create, @function
create:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $16, %esp
    movl    8(%ebp), %ecx
    movl    $1, -8(%ebp)
    movl    $2, -4(%ebp)
    movl    -8(%ebp), %eax
    movl    -4(%ebp), %edx
    movl    %eax, (%ecx)
    movl    %edx, 4(%ecx)
    movl    %ecx, %eax
    leave
    ret $4
    .size   create, .-create
.globl main
    .type   main, @function
main:
    pushl   %ebp
    movl    %esp, %ebp
    subl    $20, %esp
    leal    -8(%ebp), %eax
    movl    %eax, (%esp)
    call    create
    subl    $4, %esp
    movl    $0, %eax
    leave
    ret
    .size   main, .-main
    .ident  "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
    .section    .note.GNU-stack,"",@progbits

The stack before call create:

        +---------------------------+
ebp     | saved ebp                 |
        +---------------------------+
ebp-4   | age part of struct person | 
        +---------------------------+
ebp-8   | no part of struct person  |
        +---------------------------+        
ebp-12  |                           |
        +---------------------------+
ebp-16  |                           |
        +---------------------------+
ebp-20  | ebp-8 (address)           |
        +---------------------------+

The stack right after calling create:

        +---------------------------+
        | ebp-8 (address)           |
        +---------------------------+
        | return address            |
        +---------------------------+
ebp,esp | saved ebp                 |
        +---------------------------+
Jingguo Yao
  • 7,320
  • 6
  • 50
  • 63
  • 2
    There are two problems here. The most obvious one is that this does not at all describe "how C allows a function to return a struct". This only describes how it can be done on 32-bit x86 hardware, which happens to be one of the most limited architectures when you look at the number of registers etc. The second problem is that the way that C compilers generate code for returning values is dictated by the ABI (except for non-exported or inlined functions). And by the way, inlined functions are probably one of the places where returning structs are most useful. – dkagedal May 01 '11 at 22:52
  • Thanks for the corrections. For a complete detailed of calling convention, http://en.wikipedia.org/wiki/Calling_convention is a good reference. – Jingguo Yao May 11 '11 at 14:25
  • @dkagedal: What is significant is not just that x86 happens to do things this way, but rather that there exists a "universal" approach (i.e. this one) that would allow compilers for any platform to support returns of any structure type that isn't so huge as to blow the stack. While compilers for many platforms will use other more efficient means for handling some structure-type return values, there is no need for the language to limit structure return types to those the platform can handle optimally. – supercat Jul 19 '18 at 16:53
0

I just want to point one advantage of passing your structs by value is that an optimizing compiler may better optimize your code.

Vad
  • 4,052
  • 3
  • 29
  • 34
0

Taking into account all of the things people have said...

  1. Returning a struct was not always allowed in C. Now it is.
  2. Returning a struct can be done in three ways... a. Returning each member in a register (probably optimal, but unlikely to be the actual...) b. Returning the struct in the stack (slower than registers, but still better than a cold access of heap ram... yay caching!) c. Returning the struct in a pointer to the heap (It only hurts you when you read or write to it? A Good compiler will pass the pointers it read just once and tried to access, did instruction reordering and accesses it much earlier than needed so it was ready when you were? to make life better? (shiver))
  3. Different compiler settings can cause different problems when the code interfaces because of this. (Different size registers, different amounts of padding, different optimizations turned on)
  4. const-ness or volatile-ness doesn't permeate through a struct, and can result in some miserably un-efficient or possibly lead to broken code (E.G. a const struct foo does not result in foo->bar being const.)

Some simple measures I will take after reading this...

  1. Make your functions accept parameters rather than structs. It allows fine grained control over const-ness and volatile-ness etc, it also ensures that all the variables passed are relevant to the function using them. If the parameters are all the same kind, use some other method to enforce ordering. (Make type defs to make your function calls more strongly typed, which an OS does routinely.)
  2. Instead of allowing the final base function to return a pointer to a structure made in the heap, provide a pointer to a struct to put the results into. that struct still might be in the heap, but it is possible that the struct is actually in the stack - and will get better runtime performance. It also means that you do not need to rely on compilers providing you a struct return type.
  3. By passing the parameters as pieces and being clear about the const-ness, volatile-ness, or the restrict-ness, you better convey your intentions to the complier and that will allow it to make better optimizations.

I am not sure where 'too big' and 'too small' is at, but I guess the answer is between 2 and register count + 1 members. If I made a struct that holds 1 member that is an int, then clearly we should not pass the struct. (Not only is it inefficient, it also makes intention VERY murky... I suppose it has a use somewhere, but not common)

If I make a struct that holds two items, it might have value in clarity, as well as compliers might optimize it into two variables that travel as pairs. (risc-v specifies that a struct with two members returns both members in registers, assuming they are ints or smaller...)

If I make a structure that holds as many ints and double as there are in the registers for in the processor, it is TECHNICALLY a possible optimization. The instance I surpass the register amounts though, it probably would have been worth it to keep the result struct in a pointer, and pass in only the parameters that were relevant. (That, and probably make the struct smaller and the function do less, because we have a LOT of registers on systems nowadays, even in the embedded world...)

Watachiaieto
  • 417
  • 3
  • 10