Arrays (including strings) are passed by reference in most high level languages. int foo(char*)
just gets a pointer value as an arg, and a pointer typically one machine word (i.e. fits in a register). In good modern calling conventions, the first few integer/pointer args are typically passed in registers.
In C/C++, you can't pass a bare array by value. Given int arr[16]; func(arr);
, the function func
only gets a pointer (to the first element).
In some other higher level languages, arrays might be more like C++ std::vector
so the callee might be able to grow/shrink the array and find out its length without a separate arg. That would typically mean there's a "control block".
In C and C++ you can pass structs by value, and then it's up to the calling-convention rules to specify how to pass them.
x86-64 System V for example passes structs of 16-byte or less packed into up to 2 integer registers. Larger structs are copied onto the stack, regardless of how large an array member they contain (What kind of C11 data type is an array according to the AMD64 ABI). (So don't pass giant objects by value to non-inline functions!)
The Windows x64 calling convention passes large structs by hidden reference.
Example:
typedef struct {
// too big makes the asm output cluttered with loops or memcpy
// int Big_McLargeHuge[1024*1024];
int arr[4];
long long a,b; //,c,d;
} bigobj;
// total 32 bytes with int=4, long long=8 bytes
int func(bigobj a);
int foo(bigobj a) {
a.arr[3]++;
return func(a);
}
source + asm output on the Godbolt compiler explorer.
You can try other architectures on Godbolt with their standard calling conventions, like ARM or AArch64. I picked x86-64 because I happened to know of an interesting difference in the two major calling conventions on that one platform for struct-passing.
x86-64 System V (gcc7.3 -O3
): foo
has a real by-value copy of its arg (done by its caller) that it can modify, so it does so and uses it as the arg for the tail-call. (If it can't tailcall, it would have to make yet another full copy. This example artificially makes System V look really good).
foo(bigobj):
add DWORD PTR [rsp+20], 1 # increment the struct member in the arg on the stack
jmp func(bigobj) # tailcall func(a)
x86-64 Windows (MSVC CL19 /Ox
): note that we address a.arr[3] via RCX, the first integer/pointer arg. So there is a hidden reference, but it's not a const-reference. This function was called by value, but it's modifying the data it got by reference. So the caller has to make a copy, or at least assume that a callee destroyed the arg it got a pointer to. (No copy required if the object is dead after that, but that's only possible for local struct objects, not for passing a pointer to a global or something).
$T1 = 32 ; offset of the tmp copy in this function's stack frame
foo PROC
sub rsp, 72 ; 00000048H ; 32B of shadow space + 32B bigobj + 8 to align
inc DWORD PTR [rcx+12]
movups xmm0, XMMWORD PTR [rcx] ; load modified `a`
movups xmm1, XMMWORD PTR [rcx+16] ; apparently alignment wasn't required
lea rcx, QWORD PTR $T1[rsp]
movaps XMMWORD PTR $T1[rsp], xmm0
movaps XMMWORD PTR $T1[rsp+16], xmm1 ; store a copy
call int __cdecl func(struct bigobj)
add rsp, 72 ; 00000048H
ret 0
foo ENDP
Making another copy of the object appears to be a missed optimization. I think this would be valid implementation of foo
for the same calling convention:
foo:
add DWORD PTR [rcx+12], 1 ; more efficient than INC because of the memory dst, on Intel CPUs
jmp func ; tailcall with pointer still in RCX
x86-64 clang for the SysV ABI also misses the optimization that gcc7.3 found, and does copy like MSVC.
So the ABI difference is less interesting than I thought; in both cases the callee "owns" the arg, even though for Windows it's not guaranteed to be on the stack. I guess this enables dynamic allocation for passing very large objects by value without a stack overflow, but that's kind of pointless. Just don't do it in the first place.
Small objects:
x86-64 System V passes small objects packed into registers. Clang finds a neat optimization if you comment out the long long
members so you just have
typedef struct {
int arr[4];
// long long a,b; //,c,d;
} bigobj;
# clang6.0 -O3
foo(bigobj): # @foo(bigobj)
movabs rax, 4294967296 # 0x100000000 = 1ULL << 32
add rsi, rax
jmp func(bigobj) # TAILCALL
(arr[0..1]
is packed into RDI, and arr[2..3]
is packed into RSI, the first 2 integer/pointer arg-passing registers in the x86-64 SysV ABI).
gcc unpacks arr[3]
into a register by itself where it can increment it.
But clang, instead of unpacking and repacking, increments the high 32 bits of RSI by adding 1ULL<<32
.
MSVC still passes by hidden reference, and still copies the whole object.