x64 mov instead of push sequence

Question

So after reversing an x64 binary I found this sequence at the beginning of a function:

mov     [rsp+8], rbx
mov     [rsp+0x10], rbp
mov     [rsp+0x18], rsi
push    rdi

Now I've never really quite done this in assembly (am only experienced in x86). For me that would just be a local variable initialization.

Any idea why one would have such a code as the function prologue?

I guess this function returns an object by value and the compiler handles this by creating an object in the callers stack and callee initializes it. — nevilad, Apr 25 '21 at 09:34
@nevilad oh, that's seems interesting.. Could you develop your idea? — Alex, Apr 25 '21 at 09:36
@nevilad: This is *Windows* x64, so the callee owns the 32 bytes above the return address, the "shadow space" ([What is the 'shadow space' in x64 assembly?](https://stackoverflow.com/q/30190132)). So no, it's not that, and that wouldn't make sense anyway because this function doesn't know the values in the registers it's storing. — Peter Cordes, Apr 25 '21 at 09:41
This is called **copy elision** in C++ and is available since C++11. The idea is to omit copy and move constructor calls resulting in zero-copy pass-by-value semantics. But I think @PeterCordes is right, the accessed stack is in the shadow area, so it's not copy elision. — nevilad, Apr 25 '21 at 09:50
@nevilad: Copy elision doesn't modify the asm calling convention; what changes is just logical invocation of constructors (or not). So the asm still looks like what you'd get in C for returning a struct by value. In normal calling conventions like x86-64, that means the caller would passes a pointer to the return-value object (as a hidden first arg), so storing the first 3 args into it would look like `mov [rcx], rdx` / `mov [rcx+8], r8` / `mov [rcx+16], r9`. (edit: example of GCC making that asm: https://godbolt.org/z/K8djM18hj) — Peter Cordes, Apr 25 '21 at 09:58

Peter Cordes · Accepted Answer · 2021-04-25T09:41:50.080

3

Seems reasonable to use the shadow space (32 bytes above the return address) for saving some of the call-preserved registers, instead of using more stack space to push them all. Without that, you'd just push any call-preserved registers you wanted to use (so you could restore them later). Here, I guess they're restored by reloading them with mov right before ret, instead of pop.

(In Windows x64, RDI and RSI are call-preserved registers, unlike x86-64 System V where they're call-clobbered arg-passing registers.)

Especially if it makes stack alignment work out nicely by allowing an odd number of total pushes, if there's no sub rsp, n to reserve more stack space. (That's presumably why it pushes RDI instead of saving it to [rsp + 0x20].)

edited Apr 25 '21 at 09:41

answered Apr 25 '21 at 09:34

Peter Cordes

328,167
45
605
847

I think that is the case here seeing the function epilogue, however I don't quite understand what you're explaining in the last paragraph... – Alex Apr 25 '21 at 09:41
2

@Alex: Windows x64 guarantees / require 16-byte alignment of RSP *before* a `call`, so a function has to adjust RSP by `n*16 + 8` before making another `call` to maintain that alignment. i.e. an odd number of pushes. [Is the Microsoft Stack always aligned to 16-bytes?](https://stackoverflow.com/q/52615235). For the same design reasons as [Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?](https://stackoverflow.com/q/49391001) – Peter Cordes Apr 25 '21 at 09:42

x64 mov instead of push sequence

1 Answers1