struct Big {
int a[8];
};
void foo(Big a);
Big getStuff();
void test1() {
foo(getStuff());
}
compiles (using clang 6.0.0 for x86_64 on Linux so System V ABI, flags: -O3 -march=broadwell
) to
test1(): # @test1()
sub rsp, 72
lea rdi, [rsp + 40]
call getStuff()
vmovups ymm0, ymmword ptr [rsp + 40]
vmovups ymmword ptr [rsp], ymm0
vzeroupper
call foo(Big)
add rsp, 72
ret
If I am reading this correctly, this is what is happening:
getStuff
is passed a pointer tofoo
's stack (rsp + 40
) to use for its return value, so aftergetStuff
returnsrsp + 40
through torsp + 71
contains the result ofgetStuff
.- This result is then immediately copied to a lower stack address
rsp
through torsp + 31
. foo
is then called, which will read its argument fromrsp
.
Why is the following code not totally equivalent (and why doesn't the compiler generate it instead)?
test1(): # @test1()
sub rsp, 32
mov rdi, rsp
call getStuff()
call foo(Big)
add rsp, 32
ret
The idea is: have getStuff
write directly to the place in the stack that foo
will read from.
Also: Here is the result for the same code (with 12 ints instead of 8) compiled by vc++ on windows for x64, which seems even worse because the windows x64 ABI passes and returns by reference, so the copy is completely unused!
_TEXT SEGMENT
$T3 = 32
$T1 = 32
?bar@@YAHXZ PROC ; bar, COMDAT
$LN4:
sub rsp, 88 ; 00000058H
lea rcx, QWORD PTR $T1[rsp]
call ?getStuff@@YA?AUBig@@XZ ; getStuff
lea rcx, QWORD PTR $T3[rsp]
movups xmm0, XMMWORD PTR [rax]
movaps XMMWORD PTR $T3[rsp], xmm0
movups xmm1, XMMWORD PTR [rax+16]
movaps XMMWORD PTR $T3[rsp+16], xmm1
movups xmm0, XMMWORD PTR [rax+32]
movaps XMMWORD PTR $T3[rsp+32], xmm0
call ?foo@@YAHUBig@@@Z ; foo
add rsp, 88 ; 00000058H
ret 0