1

This is a basic C code for a basic function call:

int multiply(int num, int k) {
    return num * k;
}

int main(int argc, char** argv) {
    
    int k = multiply(5,2);
}

When I tried diassembling this code using the tool available at godbolt.org and picking the option for x86-64 clang 10.0.1, I got the following assembly instructions:

multiply:                               # @multiply
        push    rbp
        mov     rbp, rsp
        mov     dword ptr [rbp - 4], edi
        mov     dword ptr [rbp - 8], esi
        mov     eax, dword ptr [rbp - 4]
        imul    eax, dword ptr [rbp - 8]
        pop     rbp
        ret

main:                                   # @main
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        mov     dword ptr [rbp - 4], edi
        mov     qword ptr [rbp - 16], rsi
        mov     edi, 5
        mov     esi, 2
        call    multiply
        xor     ecx, ecx
        mov     dword ptr [rbp - 20], eax
        mov     eax, ecx
        add     rsp, 32
        pop     rbp
        ret

However, the stack pointer doesn't seem to change with each (or even after) parameters are added to the stack from the registers edi and esi by the callee, and remains pointing to the location containing the old value of the base pointer register. Why does that happen?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mehdi Charife
  • 722
  • 1
  • 7
  • 22
  • 1
    Terminology nitpic: Godbolt isn't *dis*assembling (unless you use binary mode in the output dropdown). It's asking the compiler not to assemble in the first place, so you can see the compiler-generated asm with symbolic names. – Peter Cordes Nov 19 '22 at 14:46

1 Answers1

1

The compiler plans all space needed for the function and adjusts the stack pointer once at the start of the function and once at the end instead of each time something is needed on the stack inside the function.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 2
    Or it doesn't move the stack pointer at all, in leaf functions that need less than 128 bytes of locals. (Unless you compile with `-mno-red-zone`.) This is what the OP describes in their `multiply` function, with RSP still pointing at where it saved the caller's RBP. But yes, what you describe is what's happening in `main`. – Peter Cordes Nov 19 '22 at 14:49
  • @PeterCordes: Zero is an adjustment amount. – Eric Postpischil Nov 19 '22 at 18:19
  • @Peter Why the stack pointer needs to be adjusted when the function in question is not a leaf function? – Mehdi Charife Nov 19 '22 at 23:54
  • @MehdiCharife: So `call` and the actions of the callee don't overwrite the space you're using for local vars. – Peter Cordes Nov 20 '22 at 00:07
  • @Peter Can't the compiler guarantee that that won't happen by reading all the functions involved in advance? – Mehdi Charife Nov 20 '22 at 00:19
  • @MehdiCharife: With optimization enabled, yes the compiler would just inline `multiply` into its caller so it doesn't actually have to use a `call`. https://godbolt.org/z/1KnsbGY6d. Otherwise no, a function call can use an arbitrary amount of space below RSP, and `call` itself pushes an 8-byte return address. (Sometimes a compiler will look inside another function for inter-procedural analysis / optimization, even if it chooses not to inline, but GCC/clang don't try to do things like using space 64 bytes down into the red zone so they can call a function that only uses 56 bytes of stack) – Peter Cordes Nov 20 '22 at 00:25
  • @Peter *"a function call can use an arbitrary amount of space below RSP"* How so? If the number of the values that will be stored by the callee can be determined by looking at the function code and by using the function parameters, the compiler can produce assembly code in which no function overwrites the space used by another without needing to adjust the value of the esp? – Mehdi Charife Nov 20 '22 at 00:32
  • 1
    @MehdiCharife: Like I said, in theory a compiler *could* do that for cases where the callee is visible and is itself a leaf function (or its callees can all be seen and stack size calculated). But if the callee is small, it's usually better just to inline it, so nobody's bothered to make GCC or clang do the optimization you're suggesting. In general, with a set of functions that call each other in various ways, doing this would require looking at all of them at once when doing stack frame layout, making compilation more expensive. – Peter Cordes Nov 20 '22 at 00:39
  • 1
    You could just give up if any callees are not leaf functions or if any cycles are found in the call graph, but instead compilers just keep it simple and spend 2 instructions per function to adjust RSP. Most of the time, inlining is a much better solution to the problem of per-function overhead with multiple small functions, avoiding the somewhat more expensive call/ret as well. (`call` is 2 uops on modern Intel CPUs; https://agner.org/optimize/ https://uops.info/). Using the red-zone only in leaf functions without alloca/VLAs is a good simple heuristic that doesn't cost much compile time. – Peter Cordes Nov 20 '22 at 00:42
  • @Peter What do you mean by *visible*? Can't the compiler see all the functions invloved? – Mehdi Charife Nov 20 '22 at 00:43
  • 1
    @MehdiCharife: No, it might only have a prototype. Without link-time optimization enabled, functions in other `.c` files are fully opaque. And functions in libraries are always opaque to the optimizer. See [Does calling library functions still make them non-leaf? how are library functions handled by x86 assembly?](https://stackoverflow.com/q/71836448) for some relevant discussion and example compiler output with optimization enabled. Also of course there can be function pointers (or C++ virtual functions), not known at compile time where they point. – Peter Cordes Nov 20 '22 at 00:48