In the SysV 64-bit ABI used by Linux and most other 64-bit x86 operating systems outside of Windows, a struct
or class
return value is either returned in the rax
or rdx
registers, or via a hidden pointer passed as the first argument.
The decision between the two options depends mostly on the size of the returned structure: structures larger than 16 bytes generally use the hidden pointer approach, but there are other factors as well and I recommend this answer for a more comprehensive treatment.
When the hidden pointer approach is used, we need a way to pass this pointer to the function. In this case the pointer behaves as if it were the first argument (passed in rdi
), which shifts the other arguments into later positions2.
We can see this clearly by examining the code generated for functions returning struct
objects of 1 through 5 int
values (hence 4 through 20 bytes on this platform). The C++ code:
struct one {
int x;
};
struct two {
int x1, x2;
};
struct three {
int x1, x2, x3;
};
struct four {
int x1, x2, x3, x4;
};
struct five {
int x1, x2, x3, x4, x5;
};
one makeOne() {
return {42};
}
two makeTwo() {
return {42, 52};
}
three makeThree() {
return {42, 52, 62};
}
four makeFour() {
return {42, 52, 62, 72};
}
five makeFive() {
return {42, 52, 62, 72, 82};
}
Results in the following assembly in clang
6.0 (but other compilers behave similarly:
makeOne(): # @makeOne()
mov eax, 42
ret
makeTwo(): # @makeTwo()
movabs rax, 223338299434
ret
makeThree(): # @makeThree()
movabs rax, 223338299434
mov edx, 62
ret
makeFour(): # @makeFour()
movabs rax, 223338299434
movabs rdx, 309237645374
ret
.LCPI4_0:
.long 42 # 0x2a
.long 52 # 0x34
.long 62 # 0x3e
.long 72 # 0x48
makeFive(): # @makeFive()
movaps xmm0, xmmword ptr [rip + .LCPI4_0] # xmm0 = [42,52,62,72]
movups xmmword ptr [rdi], xmm0
mov dword ptr [rdi + 16], 82
mov rax, rdi
ret
The basic pattern is that up to and including 8 bytes, the struct
is returned in entirely in rax
(including packing multiple smaller values in the 64-bit register), and for objects up to 16 bytes both rax
and rdx
are used1.
After that, the strategy changes completely, and we see that a memory write occurs to the location pointed to by rdi
- this is the above-mentioned hidden pointer approach.
Finally, to wrap it all up, we note that sizeof(vector<int>)
is usually 24 bytes on 64-bit platforms, and is definitely 24 bytes on the major C++ compilers on Linux - so the hidden pointer approach applies for vector.
Credit to Jester who already answered this, in a briefer form, in the comments.
1 The weird constants like 223338299434
that are being stored into the 64-bit registers are just an optimization: the compiler is just combining both 32-bit stores into a single 64-bit constant, as in 52ul << 32 | 42ul
which results in 223338299434
.
2 This is the same approach used to pass this
for member functions: in the case that a member function also returns a value that is passed with the hidden pointer approach, the hidden pointer comes first (in rdi
), then the this
pointer (in rsi
) and then finally the first user-provided argument (usually in rdx
- but this depends on the type). Here's an example.