67

A struct can be either passed/returned by value or passed/returned by reference (via a pointer) in C.

The general consensus seems to be that the former can be applied to small structs without penalty in most cases. See Is there any case for which returning a structure directly is good practice? and Are there any downsides to passing structs by value in C, rather than passing a pointer?

And that avoiding a dereference can be beneficial from both a speed and clarity perspective. But what counts as small? I think we can all agree that this is a small struct:

struct Point { int x, y; };

That we can pass by value with relative impunity:

struct Point sum(struct Point a, struct Point b) {
  return struct Point { .x = a.x + b.x, .y = a.y + b.y };
}

And that Linux's task_struct is a large struct:

https://github.com/torvalds/linux/blob/b953c0d234bc72e8489d3bf51a276c5c4ec85345/include/linux/sched.h#L1292-1727

That we'd want to avoid putting on the stack at all costs (especially with those 8K kernel mode stacks!). But what's about middling ones? I assume structs smaller than a register are fine. But what about these?

typedef struct _mx_node_t mx_node_t;
typedef struct _mx_edge_t mx_edge_t;

struct _mx_edge_t {
  char symbol;
  size_t next;
};

struct _mx_node_t {
  size_t id;
  mx_edge_t edge[2];
  int action;
};

What is the best rule of thumb for determining whether a struct is small enough that it's safe to pass it around by value (short of extenuating circumstances such as some deep recursion)?

Lastly please don't tell me that I need to profile. I'm asking for a heuristic to use when I'm too lazy/it's not worth it to investigate further.

EDIT: I have two followup questions based on the answers so far:

  1. What if the struct is actually smaller than a pointer to it?

  2. What if a shallow copy is the desired behavior (the called function will perform a shallow copy anyway)?

EDIT: Not sure why this got marked as a possible duplicate as I actually link the other question in my question. I'm asking for clarification on what constitutes a small struct and am well aware that most of the time structs should be passed by reference.

Community
  • 1
  • 1
Kaiting Chen
  • 1,076
  • 1
  • 7
  • 12
  • 3
    Why send the entire house to the person who just needs your address. It is always more advisable to pass pointer. – Vinay Shukla Jun 22 '15 at 13:29
  • 1
    @VinayShukla It's always _at least as fast_ to pass a pointer as to pass by value. But if you do a lot of pointer dereferences, that could negate any negligible advantage in passing the parameter. – Daniel Jun 22 '15 at 13:31
  • @Daniel I do agree with you but is always better to pass a structure by pointer rather than passing it by value. There are many other advantages like pass by value performs a shallow copy and the value modified wont be reflected. – Vinay Shukla Jun 22 '15 at 13:37
  • 5
    @VinayShukla Why is that an advantage? Often, a shallow copy is exactly the behavior that is desired. – Daniel Jun 22 '15 at 13:51
  • @Daniel: Pointer dereferences are likely optimized by the compiler. It is very likely still faster than a shallow copy. For a structure returned, this is by many PCS referenced by pointer anyway, so no difference. Only two reasons for passing a struct might be valid: small struct acording PCS in registers (readability), or local modification. – too honest for this site Jun 22 '15 at 14:05
  • 1
    This is a very good question! Everyone always says that "small structs are ok" but without defining "how small is small". – Lundin Jun 22 '15 at 14:11
  • @Daniel At least on x86 arch, dereferencing through %ebp is as slow/fast as through any other GPR. – user3125367 Jun 22 '15 at 14:13
  • If it’s not worth to investigate further, it is probably not worth bothering at all. – Jonas Schäfer Jun 22 '15 at 14:45
  • possible duplicate of [Are there any downsides to passing structs by value in C, rather than passing a pointer?](http://stackoverflow.com/questions/161788/are-there-any-downsides-to-passing-structs-by-value-in-c-rather-than-passing-a) – this Jun 22 '15 at 16:17
  • Use your judgement, just like with e.g. "When should I break up a method into smaller methods?" – user253751 Jun 23 '15 at 03:39
  • @JonasSchäfer What are you suggesting? Throw a die? – martinkunev Sep 03 '21 at 00:40

9 Answers9

33

On small embedded architectures (8/16-bitters) -- always pass by pointer, as non-trivial structures don't fit into such tiny registers, and those machines are generally register-starved as well.

On PC-like architectures (32 and 64 bit processors) -- passing a structure by value is OK provided sizeof(mystruct_t) <= 2*sizeof(mystruct_t*) and the function does not have many (usually more than 3 machine words' worth of) other arguments. Under these circumstances, a typical optimizing compiler will pass/return the structure in a register or register pair. However, on x86-32, this advice should be taken with a hefty grain of salt, due to the extraordinary register pressure a x86-32 compiler must deal with -- passing a pointer may still be faster due to reduced register spilling and filling.

Returning a structure by value on PC-likes, on the other hand, follows the same rule, save for the fact that when a structure is returned by pointer, the structure to be filled out should be passed in by pointer as well -- otherwise, the callee and the caller are stuck having to agree on how to manage the memory for that structure.

LThode
  • 1,843
  • 1
  • 17
  • 28
  • 4
    I'm possibly missing something obvious, but why "sizeof(mystruct_t) <= 2*sizeof(mystruct_t*)"? – Claudiu Aug 01 '18 at 07:34
  • 8
    @Claudiu -- that's basically a way of saying "it takes up no more than two machine words of memory" – LThode Aug 03 '18 at 15:18
26

My experience, nearly 40 years of real-time embedded, last 20 using C; is that the best way is to pass a pointer.

In either case the address of the struct needs to be loaded, then the offset for the field of interest needs to be calculated...

When passing the whole struct, if it is not passed by reference, then

  1. it is not placed on the stack
  2. it is copied, usually by a hidden call to memcpy()
  3. it is copied to a section of memory that is now 'reserved' and unavailable to any other part of the program.

Similar considerations exist for when a struct is returned by value.

However, "small" structs, that can be completely held in a working register to two are passed in those registers especially if certain levels of optimization are used in the compile statement.

The details of what is considered 'small' depend on the compiler and the underlying hardware architecture.

Lundin
  • 195,001
  • 40
  • 254
  • 396
user3629249
  • 16,402
  • 1
  • 16
  • 17
  • 3
    Another reason, IMHO, is that it avoids confusion. If you always pass/return pointers, you never have to wonder whether any particular reference in you code is pointer or copy. – jamesqf Jun 22 '15 at 18:25
  • 12
    Point 1 is plain wrong for current 32+ bit CPUs/platforms and wrong for modern 8/16 bit MCUs (e.g. MSP430). Same for 3, for similar reasons. Without that, the function would not be thread-safe by default, rendering such functions unusable with threads. – too honest for this site Jun 22 '15 at 18:44
  • @Olaf, some years ago, I was writing for several of the Motorola 8/16 bit CPUs. The calls to memcpy() and the reserved memory were common problems which we got around by always passing pointers as we did not want to waste the stack space by placing the object on the stack. Modern compilers may hide the placing of the object on the stack, but the problem is NOT fixed by which CPU is being used. – user3629249 Oct 02 '16 at 22:12
  • 6
    Read your answer again! You write the object is **not** placed on the stack. Let apart implementations which don't use a stack at all (C does not mandate a stack), modern architectures **do place** it on the stack. Where else should they? Using (hidden) global variables is nonsense and not thread-safe. What you write was practice in the 80ies and part of the 90ies, but definitively not for modern implementations and architectures like the ones OP refers to. – too honest for this site Oct 02 '16 at 23:14
  • 2
    This answer doesn't make much sense. Completely nonsensical is your "it is copied to a section of memory that is now 'reserved' and unavailable to any other part of the program" statement. Well, I bloody well hope that the memory my structure resides in is unavailable during its lifetime. Then, as Olaf pointed out, it is of course passed on the stack, how else (unless it fits a register)? And yes, how else than by (some kind of) mempcpy()? Not sure what you mean by "hidden", except that you don't have to write it down explicitly, thank goodness for that. – Peter - Reinstate Monica Aug 26 '17 at 11:41
  • Plus: What you don't say is how any of this (the correct bits) affects performance. You seem to mean badly; but that's not at all clear. – Peter - Reinstate Monica Aug 26 '17 at 11:42
  • 1
    @Olaf Not only would hidden variables not be thread-safe; more generally such functions are not reentrant, and cannot be recursive, in other words it's not standard C. – Peter - Reinstate Monica Aug 26 '17 at 11:54
13

Since the argument-passing part of the question is already answered, I'll focus on the returning part.

The best thing to do IMO is to not return structs or pointers to structs at all, but to pass a pointer to the 'result struct' to the function.

void sum(struct Point* result, struct Point* a, struct Point* b);

This has the following advantages:

  • The result struct can live either on the stack or on the heap, at the caller's discretion.
  • There are no ownership problems, as it is clear that the caller is responsible for allocating and freeing the result struct.
  • The structure could even be longer than what is needed, or be embedded in a larger struct.
alain
  • 11,939
  • 2
  • 31
  • 51
8

How a struct is passed to or from a function depends on the application binary interface (ABI) and the procedure call standard (PCS, sometimes included in the ABI) for your target platform (CPU/OS, for some platforms there may be more than one version).

If the PCS actually allows to pass a struct in registers, this not only depends on its size, but also on its position in the argument list and the types of preceeding arguments. ARM-PCS (AAPCS) for instance packs arguments into the first 4 registers until they are full and passes further data onto the stack, even if that means an argument is split (all simplified, if interested: the documents are free for download from ARM).

For structs returned, if they are not passed through registers, most PCS allocate the space on the stack by the caller and pass a pointer to the struct to the callee (implicit variant). This is identical to a local variable in the caller and passing the pointer explicitly - for the callee. However, for the implicit variant, the result has to be copied to another struct, as there is no way to get a reference to the implicitly allocated struct.

Some PCS might do the same for argument structs, others just use the same mechanisms as for scalars. In any way, you defer such optimizations until you really know you need them. Also read the PCS of your target platform. Remember, that your code might perform even worse on a different platform.

Note: passing a struct through a global temp is not used by modern PCS, as it is not thread-safe. For some small microcontroller architectures, this might be different, however. Mostly if they only have a small stack (S08) or restricted features (PIC). But for these most times structs are not passed in registers, either, and pass-by-pointer is strongly recommended.

If it is just for immutability of the original: pass a const mystruct *ptr. Unless you cast away the const that will give a warning at least when writing to the struct. The pointer itself can also be constant: const mystruct * const ptr.

So: No rule of thumb; it depends on too many factors.

undur_gongor
  • 15,657
  • 5
  • 63
  • 75
too honest for this site
  • 12,050
  • 4
  • 30
  • 52
  • 3
    Without having any greater insight in every ABI/calling convention out there, I would think that for most processors/ABIs, the rule of thumb would be "if the struct size is smaller or equal to the data bus of the CPU then passing by value is fine". 1 byte for 8-bit PIC, 4 bytes for 32-bit ARM, 8 bytes for 64-bit Intel PC and so on. But the best rule of thumb if portability is concerned, is probably to always pass by reference. – Lundin Jun 22 '15 at 14:21
  • @Lundin: I would accept mostly mutability as a reason for pass-by-value. For a return value, I see little reason actually, as you cannot get the address of the result to store the struct for later usage (and you cannot access fields directly from the call either: `f().field1` does not work. Note that AAPCS does actually pack container types up to 128 (4*32 bit) bits into registers, not just 1. So, things are much more complicated. – too honest for this site Jun 22 '15 at 14:36
  • @Lundin: I think it's pretty common for an API to allow structures whose size is a suitably-small *power of two*, but sizes that aren't powers of two often do not receive such special treatment. Thus, on many platforms, passing a 32-bit structure or even a 64-bit structure may be faster than passing a 24-bit structure. – supercat Jun 22 '15 at 15:35
  • @Lundin: I'll accept your comment as the answer if you convert it. You're the only person to have provided an actionable rule of thumb. – Kaiting Chen Jun 22 '15 at 15:46
  • @KaitingChen: There are answers for which there is no "rule of thumb" (but there might be a "rule of dumb" - not implying that Lundin's answer is one!). Just ask yourself, **why** you actually found so many different and partly contrary answers on your search! – too honest for this site Jun 22 '15 at 16:38
  • @Olaf: It's well acknowledged that structs should be passed by reference. As a rule I pass the vast majority of my structs in this manner. However there are cases when a struct is trivial enough such that passing by value is acceptable (an example is provided in the question). This question asks how trivial a struct must be for this to be the case. While I recognize that it is impossible to say deterministically and ahead of time when this is the case per the plethora of architectures and ABIs Lundin's comment provides a useful and conservative rule in the absence of more information. – Kaiting Chen Jun 22 '15 at 16:48
  • @KaitingChen: I fully understand your point. However, note that your code might perform very different, even between - for instance - x86 and x64 ports. However, if you are only concerned about a single platform, you first should read the PCS, before starting such optimizations. – too honest for this site Jun 22 '15 at 16:55
  • @KaitingChen I wouldn't post that as an answer since I don't know enough ABIs to tell that the rule makes sense generically. However, I can't think of a system where it doesn't. Compilers tend to treat structs just as any other variable when it comes to parameter passing. – Lundin Jun 23 '15 at 06:11
4

Really the best rule of thumb, when it comes to passing a struct as argument to a function by reference vs by value, is to avoid passing it by value. The risks almost always outweigh the benefits.

For the sake of completeness I'll point out that when passing/returning a struct by value a few things happen:

  1. all the structure's members are copied on the stack
  2. if returning a struct by value, again, all members are copied from the function's stack memory to a new memory location.
  3. the operation is error prone - if the structure's members are pointers a common error is to assume you are safe to pass the parameter by value, since you are operating on pointers - this can cause very difficult to spot bugs.
  4. if your function modifies the value of the input parameters and your inputs are struct variables, passed by value, you have to remember to ALWAYS return a struct variable by value (I've seen this one quite a few times). Which means double the time copying the structure members.

Now getting to what small enough means in terms of size of the struct - so that it's 'worth' passing it by value, that would depend on a few things:

  1. the calling convention: what does the compiler automatically save on the stack when calling that function(usually it's the content of a few registers). If your structure members can be copied on the stack taking advantage of this mechanism than there is no penalty.
  2. the structure member's data type: if the registers of your machine are 16 bits and your structure's members data type is 64 bit, it obviously won't fit in one registers so multiple operations will have to be performed just for one copy.
  3. the number of registers your machine actually has: assuming you have a structure with only one member, a char (8bit). That should cause the same overhead when passing the parameter by value or by reference (in theory). But there is potentially one other danger. If your architecture has separate data and address registers, the parameter passed by value will take up one data register and the parameter passed by reference will take up one address register. Passing the parameter by value puts pressure on the data registers which are usually used more than the address registers. And this may cause spills on the stack.

Bottom line - it's very difficult to say when it's ok to pass a struct by value. It's safer to just not do it :)

Pandrei
  • 4,843
  • 3
  • 27
  • 44
3

Note: reasons to do so one way or the other overlap.

When to pass/return by value:

  1. The object is a fundamental type like int, double, pointer.
  2. A binary copy of the object must be made - and object is not large.
  3. Speed is important and passing by value is faster.
  4. The object is conceptually a smallish numeric

    struct quaternion {
      long double i,j,k;
    }
    struct pixel {
      uint16_t r,g,b;
    }
    struct money {
      intmax_t;
      int exponent;
    }
    

When to use a pointer to the object

  1. Unsure if value or a pointer to value is better - so this is the default choice.
  2. The object is large.
  3. Speed is important and passing by a pointer to the object is faster.
  4. Stack usage is critical. (Strictly this may favor by value in some cases)
  5. Modifications to the passed object are needed.
  6. Object needs memory management.

    struct mystring {
      char *s;
      size_t length;
      size_t size;
    }
    

Notes: Recall that in C, nothing is truly passed by reference. Even passing a pointer is passed by value, as the value of the pointer is copied and passed.

I prefer passing numbers, be they int or pixel by value as it is conceptually easier to understand code. Passing numerics by address is conceptual a bit more difficult. With larger numeric objects, it may be faster to pass by address.

Objects having their address passed may use restrict to inform the function the objects do not overlap.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
2

On a typical PC, performance should not be an issue even for fairly large structures (many dozens of bytes). Consequently other criteria are important, especially semantics: Do you indeed want to work on a copy? Or on the same object, e.g. when manipulating linked lists? The guideline should be to express the desired semantics with the most appropriate language construct in order to make the code readable and maintainable.

That said, if there is any performance impact it may not be as clear as one would think.

  • Memcpy is fast, and memory locality (which is good for the stack) may be more important than data size: The copying may all happen in the cache, if you pass and return a struct by value on the stack. Also, return value optimization should avoid redundant copying of local variables to be returned (which naive compilers did 20 or 30 years ago).

  • Passing pointers around introduces aliases to memory locations which then cannot be cached as efficiently any longer. Modern languages are often more value-oriented because all data is isolated from side effects which improves the compiler's ability to optimize.

The bottom line is yes, unless you run into problems feel free to pass by value if it is more convenient or appropriate. It may even be faster.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
0

We do not pass structs by value, neither we use naked pointers (gasp!) all the time and everywhere. Example.

ERR_HANDLE mx_multiply ( MX_HANDLE result, MX_HANDLE left, MX_HANDLE right ) ;
  • result left and right are instances of the same (struct) type for 2D matrix
  • multiply is some other error (struct) type
  • 'handle' is the address of the struct on the memory 'slab' pre-allocated for the instances of the same types

is this safe? Very. Is this slow? A bit slower vs naked pointers.

Chef Gladiator
  • 902
  • 11
  • 23
-2

in an abstract way a set of data values passed to a function is a structure by value, albeit undeclared as such. you can declare a function as a structure, in some cases requiring a type definition. when you do this everything is on the stack. and that is the problem. by putting your data values on the stack it becomes vulnerable to over writing if a function or sub is called with parameters before you utilize or copy the data elsewhere. it is best to use pointers and classes.

SkipBerne
  • 119
  • 4
  • Your first idea (a set of parameters for a function is logically a structure) is valid, but misses the point that the passing mechanism may be different (single values can be passed in registers). Your overwriting concern is unfounded though. If a rogue pointer overwrites your stack you are screwed in any case; if you mean that you used copying when you wanted references you should indeed simply use pointers; but only then. – Peter - Reinstate Monica Aug 26 '17 at 10:43