Why isn't pass struct by reference a common optimization?

Question

Up until today, I had always thought that decent compilers automatically convert struct pass-by-value to pass-by-reference if the struct is large enough that the latter would be faster. To the best of my knowledge, this seems like a no-brainer optimization. However, to satisfy my curiosity as to whether this actually happens, I created a simple test case in both C++ and D and looked at the output of both GCC and Digital Mars D. Both insisted on passing 32-byte structs by value when all the function in question did was add up the members and return the values, with no modification of the struct passed in. The C++ version is below.

#include "iostream.h"

struct S {
    int i, j, k, l, m, n, o, p;
};

int foo(S s) {
    return s.i + s.j + s.k + s.l + s.m + s.n + s.o + s.p;
}

int main() {
    S s;
    int bar = foo(s);
    cout << bar;
}

My question is, why the heck wouldn't something like this be optimized by the compiler to pass-by-reference instead of actually pushing all those ints onto the stack?

Note: Compiler switches used: GCC -O2 (-O3 inlined foo().), DMD -O -inline -release.

Edit: Obviously, in the general case the semantics of pass-by-value vs. pass-by-reference won't be the same, such as if copy constructors are involved or the original struct is modified in the callee. However, in a lot of real-world scenarios, the semantics will be identical in terms of observable behavior. These are the cases I'm asking about.

A better test-case would be if `s` has static storage class, instead of being an (uninitialized!) automatic local. Then a calling-convention that passed by const-reference for objects that don't fit in a couple registers would be more of a win. (It has to be baked into the calling convention based only on struct size, see Michael's answer.) — Peter Cordes, Jul 19 '18 at 21:10

Michael Burr · Accepted Answer · 2009-02-16T05:35:20.360

Don't forget that in C/C++ the compiler needs to be able to compile a call to a function based only on the function declaration.

Given that callers might be using only that information, there's no way for a compiler to compile the function to take advantage of the optimization you're talking about. The caller can't know the function won't modify anything and so it can't pass by ref. Since some callers might pass by value due to lack of detailed information, the function has to be compiled assuming pass-by-value and everybody needs to pass by value.

Note that even if you marked the parameter as 'const', the compiler still can't perform the optimization, because the function could be lying and cast away the constness (this is permitted and well-defined as long as the object being passed in is actually not const).

I think that for static functions (or those in an anonymous namespace), the compiler could possibly make the optimization you're talking about, since the function does not have external linkage. As long as the address of the function isn't passed to some other routine or stored in a pointer, it should not be callable from other code. In this case the compiler could have full knowledge of all callers, so I suppose it could make the optimization.

I'm not sure if any do (actually, I'd be surprised if any do, since it probably couldn't be applied very often).

Of course, as the programmer (when using C++) you can force the compiler to perform this optimization by using const& parameters whenever possible. I know you're asking why the compiler can't do it automatically, but I suppose this is the next best thing.

When doing link-time-optimization, a.k.a. link time code generation or whole program compilation, the compiler doesn't need to compile the call only based on the declaration. It has full insight into what's going on. For compiling embedded applications that are size- and speed-sensitive, link time code generation is the only way to go anyway. — Kuba hasn't forgotten Monica, Feb 03 '15 at 16:16
You're kind of mixing up the asm calling-convention with how the compiler implements the semantics of the C++ source. This optimization could be made possible by designing the calling convention that way (see [@supercat's answer](https://stackoverflow.com/questions/552134/why-isnt-pass-struct-by-reference-a-common-optimization/51429133#51429133)) based only on the sizes of types, as you say not *just* by the compiler alone and not by looking inside the callee. — Peter Cordes, Jul 19 '18 at 23:42
I think your answer is restricted to the context of ABI conventionism. In the core C++ language - especially with C++20 modules - we can strongly assume situations where each function body is known. Then the situation is interpreted entirely different, the question of OP DOES make sense and the compiler could generate code-graphs specialized for each invocation. But your answer is very relevant to today's compilers which are imperfect in many ways! — rplgn, Jan 02 '23 at 23:04

score 13 · Answer 2 · answered Feb 16 '09 at 03:21

13

The problem is you're asking the compiler to make a decision about the intention of user code. Maybe I want my super large struct to be passed by value so that I can do something in the copy constructor. Believe me, someone out there has something they validly need to be called in a copy constructor for just such a scenario. Switching to a by ref will bypass the copy constructor.

Having this be a compiler generated decision would be a bad idea. The reason being is that it makes it impossible to reason about the flow of your code. You can't look at a call and know what exactly it will do. You have to a) know the code and b) guess the compiler optimization.

answered Feb 16 '09 at 03:21

JaredPar

733,204
149
1,241
1,454

1

C++ language rules define the program semantics, but the asm implementation could pass by pointer in cases that are still pass-by-value according to C++. For example, the Windows x64 calling convention passes large structs by pointer (but non-const, so the caller needs to make a copy). A calling convention that passed by pointer with a caller-owns-the-memory semantics could optimize away the copy if the copy-constructor produces a bit-identical copy of the object. tl;dr: it's possible to design a calling convention like this without breaking C++, I think. – Peter Cordes Jul 19 '18 at 21:19
But the real problem might be objects that know their own address. – Peter Cordes Jul 19 '18 at 21:22

score 10 · Answer 3 · edited Feb 16 '09 at 09:12

10

One answer is that the compiler would need to detect that the called method does not modify the contents of the struct in any way. If it did, then the effect of passing by reference would differ from that of passing by value.

edited Feb 16 '09 at 09:12

Aaron Digulla

321,842
108
597
820

answered Feb 16 '09 at 03:19

Edmund

10,533
3
39
57

1

Yes, I suppose you could have the ABI for your language declare that structs are passed by reference, but turned into copies if ever the callee tries to modify them. Sounds a bit messy though -- you need to employ all the tricky worst-case aliasing analysis to determine that they are guaranteed to be untouched. Easier just to have explicit rules in the language -- if you don't want passing a struct to slow you down, make a reference or pointer to it. – Edmund Jan 24 '12 at 05:00

score 4 · Answer 4 · answered Feb 16 '09 at 03:22

It is true that compilers in some languages could do this if they have access to the function being called and if they can assume that the called function will not be changing. This is sometimes referred to as global optimization and it seems likely that some C or C++ compilers would in fact optimize cases such as this - more likely by inlining the code for such a trivial function.

score 4 · Answer 5 · answered Feb 16 '09 at 04:17

I think this is definitely an optimization you could implement (under some assumptions, see last paragraph), but it's not clear to me that it would be profitable. Instead of pushing arguments onto the stack (or passing them through registers, depending on the calling convention), you would push a pointer through which you would read values. This extra indirection would cost cycles. It would also require the passed argument to be in memory (so you could point to it) instead of in registers. It would only be beneficial if the records being passed had many fields and the function receiving the record only read a few of them. The extra cycles wasted by indirection would have to make up for the cycles not wasted by pushing unneeded fields.

You may be surprised that the reverse optimization, argument promotion, is actually implemented in LLVM. This converts a reference argument into a value argument (or an aggregate into scalars) for internal functions with small numbers of fields that are only read from. This is particularly useful for languages which pass nearly everything by reference. If you follow this with dead argument elimination, you also don't have to pass fields that aren't touched.

It bears mentioning that optimizations that change the way a function is called can only work when the function being optimized is internal to the module being compiled (you get this by declaring a function static in C and with templates in C++). The optimizer has to fix not only the function but also all the call points. This makes such optimizations fairly limited in scope unless you do them at link time. In addition, the optimization would never be called when a copy constructor is involved (as other posters have mentioned) because it could potentially change the semantics of the program, which a good optimizer should never do.

BeeOnRope · Answer 6 · 2018-07-20T03:06:53.213

2

Effectively passing a struct by reference even when the function declaration indicates pass-by-value is a common optimization: it's just that it usually happens indirectly via inlining, so it's not obvious from the generated code.

However, for this to happen, the compiler needs to know that callee doens't modify the passed object while it is compiling the caller. Otherwise, it will be restricted by the platform/language ABI which dictates exactly how values are passed to functions.

It can happen even without inlining!

Still, some compilers do implement this optimization even in the absence of inlining, although the circumstances are relatively limited, at least on platforms using the SysV ABI (Linux, OSX, etc) due to the constraints of stack layout. Consider the following simple example, based directly on your code:

__attribute__((noinline))
int foo(S s) {
    return s.i + s.j + s.k + s.l + s.m + s.n + s.o + s.p;
}

int bar(S s) {
    return foo(s);
}

Here, at the language level bar calls foo with pass-by-value semantics as required by C++. If we examine the assembly generated by gcc, however, it looks like this:

foo(S):
        mov     eax, DWORD PTR [rsp+12]
        add     eax, DWORD PTR [rsp+8]
        add     eax, DWORD PTR [rsp+16]
        add     eax, DWORD PTR [rsp+20]
        add     eax, DWORD PTR [rsp+24]
        add     eax, DWORD PTR [rsp+28]
        add     eax, DWORD PTR [rsp+32]
        add     eax, DWORD PTR [rsp+36]
        ret
bar(S):
        jmp     foo(S)

Note that bar just directly calls foo, without making a copy: bar will use the same copy of s that was passed to bar (on the stack). In particular it doesn't make any copy as is implied by the language semantics (ignoring as if). So gcc has performed exactly the optimization you requested. Clang doesn't do it though: it makes a copy on the stack which it passes to foo().

Unfortunately, the cases where this can work are fairly limited: SysV requires that these large structures are passed on the stack in a specific position, so such re-use is only possible if callee expects the object in the exact same place.

That's possible in the foo/bar example since bar takes it's S as the first parameter in the same way as foo, and bar does a tail call to foo which avoids the need for the implicit return-address push that would otherwise ruin the ability to re-use the stack argument.

For example, if we simply add a + 1 to the call to foo:

int bar(S s) {
    return foo(s) + 1;
}

The trick is ruined, since now the position of bar::s is different than the location foo will expect its s argument, and we need a copy:

bar(S):
        push    QWORD PTR [rsp+32]
        push    QWORD PTR [rsp+32]
        push    QWORD PTR [rsp+32]
        push    QWORD PTR [rsp+32]
        call    foo(S)
        add     rsp, 32
        add     eax, 1
        ret

This doesn't mean that the caller bar() has to be totally trivial though. For example, it could modify its copy of s, prior to passing it along:

int bar(S s) {
    s.i += 1;
    return foo(s);
}

... and the optimization would be preserved:

bar(S):
        add     DWORD PTR [rsp+8], 1
        jmp     foo(S)

In principle, this possibility for this kind of optimization is much greated in the Win64 calling convention which uses a hidden pointer to pass large structures. This gives a lot more flexibility in reusing existing structures on the stack or elsewhere in order to implement pass-by-reference under the covers.

Inlining

All that aside, however, the main way this optimization happens is via inlining.

For example, at -O2 compilation all of clang, gcc and MSVC don't make any copy of the S object¹. Both clang and gcc don't really create the object at all, but just calculated the result more or less directly without even referring unused fields. MSVC does allocate stack space for a copy, but never uses it: it fills out only one copy of S only and reads from that, just like pass-by-reference (MSVC generates much worse code than the other two compilers for this case).

Note that even though foo is inlined into main the compilers also generate a separate standalone copy of the foo() function since it has external linkage and so could be used by this object file. In this, the compiler is restricted by the application binary interface: the SysV ABI (for Linux) or Win64 ABI (for Windows) defines exactly how values must be passed, depending on the type and size of the value. Large structures are passed by hidden pointer, and the compiler has to respect that when compiling foo. It also has to respect that compiling some caller of foo when foo cannot be seen: since it has no idea what foo will do.

So there is very little window for the compiler to make a an effective optimization which transforms pass-by-value to pass-by-reference because:

1) If it can see both the caller and callee (main and foo in your example), it is likely that the callee will be inlined into the caller if it is small enough, and as the function becomes large and not-inlinable, the effect of fixed cost things like calling convention overhead become relatively smaller.

2) If the compiler cannot see both the caller and callee at the same time², it generally has to compile each according to the platform ABI. There is no scope for optimization of the call at the call site since the compiler doesn't know what the callee will do, and there is no scope for optimization within the callee because the compiler has to make conservative assumptions about what the caller did.

¹ My example is slightly more complicated that your original one to avoid the compiler just optimizing everything away entirely (in particular, you access uninitialized memory, so your program doesn't even have defined behavior): I populate a few of the fields of s with argc which is a value the compiler can't predict.

² A compiler can see both "at the same time" generally means they are either in the same translation unit or that link-time-optimization is being used.

edited Jul 20 '18 at 03:06

answered Jul 20 '18 at 00:20

BeeOnRope

60,350
16
207
386

Full optimization for MSVC is `-Ox`, not just `-O2`. But that doesn't change the code-gen at all in this case. That's shockingly poor constant-propagation of the zeros in the struct, and CSE of using `argc` three times. Not to mention the store/reload. I know MSVC is usually not great, but this looked like a really easy optimization after inlining. Manually inlining makes MSVC just use `lea` to multiply by 3 like the other compilers, so apparently MSVC still sucks at optimizing after inlining. (IIRC, older versions sometimes failed to host vector constants out of loops after inlining.) – Peter Cordes Jul 20 '18 at 01:03
*Large structures are passed by hidden pointer* is only true for Windows x64. In x86-64 System V, large structs are passed by value on the stack. Windows x64 still has to make a copy because the callee owns the pointed-to memory (unless I guess the object dies so the caller can let their only copy be clobbered). I think the design decision is to optimize for variadic functions so every arg is 8 bytes and you can index them as an array. Possibly because it's also more forgiving of ABI mismatches when adding new members to the end of a struct that read-only callees don't know about. – Peter Cordes Jul 20 '18 at 01:10
I checked, none of gcc/clang/MSVC let their copy of a struct be clobbered when calling by value. https://godbolt.org/g/YFDhru. (`__attribute__((ms_abi))` for the Linux compilers). So that's a missed optimization. – Peter Cordes Jul 20 '18 at 01:21
@PeterCordes - I'm not following the problem with `/O2` vs `/Ox`. I just picked an arbitrary optimization level of `O2` for all three compilers but these certainly aren't necessarily equivalent: the compilers are very different beasts and take very different decisions both overall and at different levels. `O2` isn't the "max" optimization level for gcc or clang either. – BeeOnRope Jul 20 '18 at 01:51
@PeterCordes Large structures passed-by-value are passed by hidden pointer in SysV. Of course a copy may be made as the language requires - but they are still passed via hidden pointer (as opposed to registers which is used for smaller structures). – BeeOnRope Jul 20 '18 at 01:52
Even with only `-O2`, gcc and clang make optimal code. MSVC doesn't, so it's definitely worth checking if `-O2` is insufficient, or if MSVC just can't do it at all. (When I first commented, I hadn't realized you'd only used `-O2` for gcc/clang.) – Peter Cordes Jul 20 '18 at 01:53
In my (very old) experience what you are saying about `/O2` vs `/Ox` isn't true on MSVC: the same basic optimizations apply, but with some small differences. I.e,. `/O2` isn't trying to compile faster or anything like that: the division is mostly that one has some small additional optimizations that have some kind of tradeoff. – BeeOnRope Jul 20 '18 at 01:56
Look again: https://godbolt.org/g/xbqJvn shows that gcc/clang don't set `rdi`, they *just* store to the stack above RSP before `call foo` for x86-64 SysV. The only access to `rdi` is to read `argc` from `edi`. – Peter Cordes Jul 20 '18 at 01:56
Re: MSVC. I don't know enough about it to be sure that a missed-optimization at `-O2` isn't going to be found at `-Ox`. I seem to recall seeing code-gen differences, but maybe only when auto-vectorizing. I forget what it was; I just always use `-Ox` when looking at MSVC output. – Peter Cordes Jul 20 '18 at 01:58
@PeterCordes - based on a quick search, `/O2` does _strictly more_ optimizations than `/Ox`, on MSVC. So in perhaps I was actually favoring MSVC since I was using it's max O value (or close to it), while "only" using `-O2 ` for clang and gcc where `-O3` is significantly different (especially for gcc). – BeeOnRope Jul 20 '18 at 02:02
@PeterCordes - but in your `ms_abi` example you have this `byref(s)` call, which could capture a pointer to the `bar::s` object, so it isn't provably dead before (or after) the `foo(s)` call as your comment mentions, so I don't think compilers could pass `foo` the local copy (for example, `foo` could easily detect the optimization and break things, e.g,. by writing a `foo::s` which aliases `bar::s`). Here's an [example](https://godbolt.org/g/kdGRfU) that removes that call, and here in fact `clang` (only) seems to be able to optimize it: it passes its local copy of `s` by reference to `foo`. – BeeOnRope Jul 20 '18 at 03:23
oh right, good catch. Maybe I could have used inline asm to make the compiler put the struct in memory once, so there was something to copy from. A normal memory barrier won't work because the whole point is *not* to have escape-analysis think it may have escaped. Or perhaps a non-inline `__attribute__((pure))` function? I haven't played with that, and IDK how well it works. Or maybe a `*(volatile char*)&s` access to `s` would force it into memory without making it `volatile` itself. – Peter Cordes Jul 20 '18 at 03:27
Or I could just do `S s = generate(argc);`, because a function isn't allowed to keep a reference to its return value. In asm it has a pointer to it, but in C / C++ it doesn't. https://godbolt.org/g/BKihZY. gcc/clang/msvc still copy the return value to pass as an arg. And for x86-64 SysV, they don't use the return value as the stack-arg for a `call`, which would also be possible. – Peter Cordes Jul 20 '18 at 03:31
Sure - it's not too surprising to me that neither gcc or clang can optimize this yet: after all, `ms_abi` is probably pretty small part of their userbase and on the SysV side this optimization is much less useful. You can still make the argument that `clang` is kind of optimizing this case since as I linked above it avoids a copy when the address doesn't escape (but we can still argue whether it is passing by reference or simply eliding its local copy and doing its initialization directly in the on-stack copy it will pass to foo. – BeeOnRope Jul 20 '18 at 03:58
Note that this has diverged somewhat from what the OP is asking: here you are talking about whether the compiler can use the fact that a local variable is dead in order to elide the copy, but rather whether it can use the fact that the callee doesn't modify the object to avoid the copy in the caller and instead pass by "reference". It seems that at least in limited cases (mostly constrained by the on-stack passing in SysV) gcc can do this. – BeeOnRope Jul 20 '18 at 04:00
Your answer covers the original topic nicely, especially explaining how inlining avoids most of the need, at least for cases where it would be a decent fraction of the total work a program was doing. It's mostly only solveable as an ABI-design issue (pass by hidden const-ref that the callee can copy if it wants to modify it. I think the callee would have to be the one to run the copy-constructor in case it cares about the address of the object, because the caller doesn't know where it will be copied to. copy-constructors can be non-inline, so the rules can't depend on exactly what they do.) – Peter Cordes Jul 20 '18 at 04:15
I was assuming that compilers would optimize away their local copy of the object in the simpler case you linked, or at least re-store the from `edi` instead of store/reload/store! Surprised that's not the case for gcc. clang for ms_abi is interesting. But clang for SysV shows it's not eliding its local copy of the object in that case, so maybe the ms_abi case really is optimizing out the copy. – Peter Cordes Jul 20 '18 at 04:19
1

Yeah, having an ABI design that would allow that would be interesting. Stack passing doesn't seem all that bad since you are talking about a memory reference in either case (stack vs hidden pointer) and it saves registers by essentially multiplexing all stack arguments off of `rsp`. Hidden pointers could help in cases like this though. Passing-by-value is seeing a bit of resurgence in C++ due to move semantics, but it often doesn't pan out e.g., when objects don't have a move-constructor, so it could help there (i.e., you could repeatedly pass an object by value through a series of ... – BeeOnRope Jul 20 '18 at 04:36
1

... calls without making a bunch of copies). Of course, inlining already usually solves this in the cases where it applies (which is all the time in C++ due to the style of most libraries). Good point about copy constructor complexities. – BeeOnRope Jul 20 '18 at 04:37
1

Another gotcha with passing on a buffer instead of copying: you need escape analysis to make sure there are no global pointers to it, otherwise the callee could find their by-value function arg modified as a side-effect of function calls! Added a section about that at the bottom of [passing rvalue to non-ref parameter, why can't the compiler elide the copy?](https://stackoverflow.com/a/49475050). – Peter Cordes Aug 24 '18 at 16:17

score 2 · Answer 7 · answered Feb 16 '09 at 03:26

There are many reasons to pass by value, and having the compiler optimise out your intention may break your code.

Example, if the called function modifies the structure in any way. If you intended the results to be passed back to the caller then you'd either pass a pointer/reference or return it yourself.

What you're asking the compiler to do is change the behaviour of your code, which would be considered a compiler bug.

If you want to make the optimization and pass by reference then by all means modify someone's existing function/method definitions to accept references; it's not all that hard to do. You might be surprised at the breakage you cause without realising it.

score 2 · Answer 8 · answered Feb 16 '09 at 03:33

Changing from by value to by reference will change the signature of the function. If the function is not static this would cause linking errors for other compilation units which are not aware of the optimization you did.
Indeed the only way to do such an optimization is by some sort of post-link global optimization phase. These are notoriously hard to do yet some compilers do them to some extent.

score 2 · Answer 9 · answered Feb 16 '09 at 04:41

Pass-by-reference is just syntactic sugar for pass-by-address/pointer. So the function must implicitly dereference a pointer to read the parameter's value. Dereferencing the pointer might be more expensive (if in a loop) then the struct copy for copy-by-value.

More importantly, like others have mentioned, pass-by-reference has different semantics than pass-by-value. const references do not mean the referenced value does not change. other function calls might change the referenced value.

IIRC the const comment is not valid in D2.0. The compiler /is/ free to assume no changes via const references. — BCS, Feb 16 '09 at 05:24

score 1 · Answer 10 · edited Jul 19 '18 at 23:36

On many platforms, large structures are in fact passed by reference, but either the caller will be expected to pass a reference to a copy that the function may manipulate as it likes¹, or the called function will be expected to make a copy of the structure to which it receives a reference and then perform any manipulations on the copy.

While there are many circumstances in which the copy operations could in fact be omitted, it will often be difficult for a compiler to prove that such operations may be eliminated. For example, given:

struct FOO { ... };

void func1(struct FOO *foo1);
void func2(struct FOO foo2);

void test(void)
{
  struct FOO foo;
  func1(&foo);
  func2(foo);
}

there is no way a compiler could know whether foo might get modified during the execution of func2 (func1 could have stored a copy of foo1 or a pointer derived from it in a file-scope object which is then used by func2). Such modifications, however, should not affect the copy of foo (i.e. foo2) received by func2. If foo were passed by reference and func2 didn't make a copy, actions that affect foo would improperly affect foo2.

Note that even void func3(const struct FOO); is not meaningful: the callee is allowed to cast away const, and the normal asm calling convention still allow the callee to modify the memory holding the by-value copy.

Unfortunately, there are relatively few cases where examining the caller or called function in isolation would be sufficient to prove that a copy operation may be safely omitted, and there are many cases where even examining both would be insufficient. Thus, replacing pass-by-value with pass-by-reference is a difficult optimization whose payoff is often insufficient to justify the difficulty.

Footnote 1: For example, Windows x64 passes objects larger than 8 bytes by non-const reference (callee "owns" the pointed-to memory). This doesn't help avoid copying at all; the motivation is to make all function args fit in 8 bytes each so they form an array on the stack (after spilling register args to shadow space), making variadic functions easy to implement.

By contrast, x86-64 System V does what the question describes for objects larger than 16 bytes: copying them to the stack. (Smaller objects are packed into up to two registers.)

Was preparing to write my own answer pointing out that this is a calling-convention-design optimization, or a private-function optimization. +1. The other answers seem to be assuming that the compiler would change the C++ source semantics, which obviously isn't allowed. — Peter Cordes, Jul 19 '18 at 23:38

score 1 · Answer 11 · answered Feb 16 '09 at 04:06

Well, the trivial answer is that the location of the struct in memory is different, and thus the data you're passing is different. The more complex answer, I think, is threading.

Your compiler would need to detect a) that foo does not modify the struct; b) that foo does not do any calculation on the physical location of the struct elements; AND c) that the caller, or another thread spawned by the caller, doesn't modify the struct before foo is finished running.

In your example, it's conceivable that the compiler could do these things - but the memory saved is inconsequential and probably not worth taking the guess. What happens if you run the same program with a struct that has two million elements?

score 1 · Answer 12 · answered Feb 16 '09 at 05:31

1

the compiler would need to be sure that the struct that is passed (as named in the calling code) in is not modified

double x; // using non structs, oh-well

void Foo(double d)
{
      x += d; // ok
      x += d; // Oops
}

void main()
{
     x = 1;
     Foo(x);
}

answered Feb 16 '09 at 05:31

BCS

75,627
68
187
294

Was the 2nd line supposed to be `d += x; // Oops`? Obviously code-gen for functions receiving args by reference would have to know that the caller owns the memory, so modification of that C++ object has to be done in a local copy. i.e. copy inside the receiving function if necessary. – Peter Cordes Jul 19 '18 at 23:27

Why isn't pass struct by reference a common optimization?

12 Answers12

It can happen even without inlining!

Inlining

Linked