C++ on x86-64: when are structs/classes passed and returned in registers?

Question

Assuming the x86-64 ABI on Linux, under what conditions in C++ are structs passed to functions in registers vs. on the stack? Under what conditions are they returned in registers? And does the answer change for classes?

If it helps simplify the answer, you can assume a single argument/return value and no floating point values.

I fear the only response can be "it depends". It depends on the compiler, on the optimization level, on the struct/class size, on the compiler mood, ... — YSC, Feb 23 '17 at 09:38
I believe that's not true at all. Since code compiled separately must be able to interoperate, the ABI specifies exactly how function calls should happen for a given signature. — jacobsa, Feb 23 '17 at 09:39
You're right, I was only thinking about functions only visible in a single translation unit. — YSC, Feb 23 '17 at 09:41
Yeah, to be clear I mean a function that's not inlined, is exported, etc. — jacobsa, Feb 23 '17 at 09:42
It doesn't depend on the compiler mood: it's all defined [here](https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf). The algorithm is a bit involved, though not complex (besides that step four is described poorly IMO). A complete answer would involve a lot of examples. — Margaret Bloom, Feb 23 '17 at 09:43
Actually it is [this version](https://refspecs.linuxbase.org/elf/x86_64-abi-0.21.pdf) that GCC and clang are implementing. — Margaret Bloom, Feb 23 '17 at 09:51
@MargaretBloom: those links are to old versions. The current revision is 0.99.8 (git revision r252), April 2016. See https://stackoverflow.com/questions/18133812/where-is-the-x86-64-abi-documented/40348010#40348010 — Peter Cordes, Sep 03 '17 at 02:04

Margaret Bloom · Accepted Answer · 2017-02-24T10:29:36.773

The ABI specification is defined here.
A newer version is available here.

I assume the reader is accustomed to the terminology of the document and that they can classify the primitive types.

If the object size is larger than two eight-bytes, it is passed in memory:

struct foo
{
    unsigned long long a;
    unsigned long long b;
    unsigned long long c;               //Commenting this gives mov rax, rdi
};

unsigned long long foo(struct foo f)
{ 
  return f.a;                           //mov     rax, QWORD PTR [rsp+8]
}

If it is non POD, it is passed in memory:

struct foo
{
    unsigned long long a;
    foo(const struct foo& rhs){}            //Commenting this gives mov rax, rdi
};

unsigned long long foo(struct foo f)
{
  return f.a;                               //mov     rax, QWORD PTR [rdi]
}

^{Copy elision is at work here}

If it contains unaligned fields, it passed in memory:

struct __attribute__((packed)) foo         //Removing packed gives mov rax, rsi
{
    char b;
    unsigned long long a;
};

unsigned long long foo(struct foo f)
{
  return f.a;                             //mov     rax, QWORD PTR [rsp+9]
}

If none of the above is true, the fields of the object are considered.
If one of the field is itself a struct/class the procedure is recursively applied.
The goal is to classify each of the two eight-bytes (8B) in the object.

The the class of the fields of each 8B are considered.
Note that an integral number of fields always totally occupy one 8B thanks to the alignment requirement of above.

Set C be the class of the 8B and D be the class of the field in consideration class.
Let new_class be pseudo-defined as

cls new_class(cls D, cls C)
{
   if (D == NO_CLASS)
      return C;

   if (D == MEMORY || C == MEMORY)
      return MEMORY;

   if (D == INTEGER || C == INTEGER)
      return INTEGER;

   if (D == X87 || C == X87 || D == X87UP || C == X87UP)
      return MEMORY;

   return SSE;
}

then the class of the 8B is computed as follow

C = NO_CLASS;

for (field f : fields)
{
    D = get_field_class(f);        //Note this may recursively call this proc
    C = new_class(D, C);
}

Once we have the class of each 8Bs, say C1 and C2, than

if (C1 == MEMORY || C2 == MEMORY)
    C1 = C2 = MEMORY;

if (C2 == SSEUP AND C1 != SSE)
   C2 = SSE;

Note This is my interpretation of the algorithm given in the ABI document.

Example

struct foo
{
    unsigned long long a;
    long double b;
};

unsigned long long foo(struct foo f)
{
  return f.a;
}

The 8Bs and their fields

First 8B: a Second 8B: b

a is INTEGER, so the first 8B is INTEGER. b is X87 and X87UP so the second 8B is MEMORY. The final class is MEMORY for both 8Bs.

Example

struct foo
{
    double a;
    long long b;
};

long long foo(struct foo f)
{
  return f.b;                     //mov rax, rdi
}

The 8Bs and their fields

First 8B: a Second 8B: b

a is SSE, so the first 8B is SSE.
b is INTEGER so the second 8B is INTEGER.

The final classes are the one calculated.

Return values

The values are returned accordingly to their classes:

MEMORY
The caller passes an hidden, first, argument to the function for it to store the result into.
In C++ this often involves a copy elision/return value optimisation. This address must be returned back into eax, thereby returning MEMORY classes "by reference" to an hidden, caller, allocated buffer.

If the type has class MEMORY, then the caller provides space for the return value and passes the address of this storage in %rdi as if it were the first argument to the function. In effect, this address becomes a “hidden” first argument. On return %rax will contain the address that has been passed in by the caller in %rdi.
INTEGER and POINTER
The registers rax and rdx as needed.
SSE and SSEUP The registers xmm0 and xmm1 as needed.
X87 AND X87UP The register st0

PODs

The technical definition is here.

The definition from the ABI is reported below.

A de/constructor is trivial if it is an implicitly-declared default de/constructor and if:

   • its class has no virtual functions and no virtual base classes, and
   • all the direct base classes of its class have trivial de/constructors, and
   • for all the nonstatic data members of its class that are of class type (or array thereof), each such class has a trivial de/constructor.

Note that each 8B is classified independently so that each one can be passed accordingly.
Particularly, they may end up on the stack if there are no more parameter registers left.

Thanks for the great answer, Margaret. Can I ask you to mention return values and non-trivial copy constructors/destructors? I'll then be happy to mark it as accepted and it can stand as a very good reference that doesn't require downloading a PDF. :-) — jacobsa, Feb 23 '17 at 19:36
@jacobsa Updated. I haven't included the definitions of the various classes the ABI uses to classify the types. That would be too broad IMO. — Margaret Bloom, Feb 24 '17 at 10:30
@BeeOnRope I thought they either fell into the pointer or struct case (though, this would be Object Slicing). I should brush up on the ABI to give you a definitive answer. A quick and dirty experiment with [Godbolt](https://godbolt.org/g/x992Rb) doesn't show anything peculiar. Unfortunately, I'm a bit busy these days but I'd be glad to help, do you have any specific question? :) — Margaret Bloom, Oct 24 '17 at 08:55
Well that was my specific question :). For example is a structure just flattened out as if all the members in all the bases were just declared in one class? Based on my tests it seems so. — BeeOnRope, Oct 24 '17 at 09:04
@BeeOnRope. They are. There is some corner case I guess but that's it. Anyway, gcc/clang use the [Itanium ABI](http://refspecs.linuxbase.org/cxxabi-1.83.html) as stated [here](https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html) (it's an industry standard nowadays). — Margaret Bloom, Oct 24 '17 at 10:27
Thanks @MargaretBloom! My curiosity was triggered by [this question](https://stackoverflow.com/q/46901697/149138) where `std::pair` and `std:tuple` were being treated differently by the ABI, and I was wondering if it was because the members of `tuple` are spread out across base classes (one member per base) while in `pair` there are just two fields in the same class (as a consequence, `pair` is standard layout while `tuple` is not). It turns out it was not that: it was related to move constructors, also covered in the Itanium ABI (but not the SysV document). — BeeOnRope, Oct 24 '17 at 19:07

Martin Bonner supports Monica · Answer 2 · 2017-02-23T10:02:25.777

6

The x86-64 ABI is documented here with version 252 (the latest ABI as of my answer) downloadable here.

If I have read page 21 et seq correctly, it says that if sizeof(struct) is 8 bytes or less, then it will be passed in a normal register. The rules get complicated after that, but I think if it 9-16 bytes, it may get passed in SSE registers.

As to classes, remember the only difference between a class and a struct is default access. However the rules do clearly say that if there is a non-trivial copy constructor or non-trivial destructor, the struct will be passed as a hidden reference.

edited Feb 23 '17 at 10:02

answered Feb 23 '17 at 09:49

Martin Bonner supports Monica

28,528
3
51
88

3

Every class has a copy constructor. The key point is a "non-trivial copy constructor or destructor". Which is why `= default;` can be important, and why `unique_ptr` is not a zero-cost abstraction. – Kerrek SB Feb 23 '17 at 09:58
@KerrekSB Good catch. – Martin Bonner supports Monica Feb 23 '17 at 10:01
@KerrekSB: I was about to come here to tell you you're wrong about `unique_ptr`, but no: [you're totally right](https://godbolt.org/g/HCC1fL). You have blown my mind today. – jacobsa Feb 23 '17 at 19:34
@jacobsa: I am confused about that Godbolt link. The unique_ptr version doesn't seem to be deleting the memory which the unique ptr owns. – Martin Bonner supports Monica Feb 23 '17 at 21:28

C++ on x86-64: when are structs/classes passed and returned in registers?

2 Answers2

Return values

PODs

Linked

Related