1

I was playing with my toy program to better understand the assembly that GCC generated. I am not able to understand why the emitted assembly is allocating extra 8 bytes of space on the stack.

Here's the C++ code:

int func(int r, int r1, int r2, int r3, int r4, int r5, int r6, int r7, int r8)
{
    int k = 23;
    int dd = r + r1 + r2 + r3 + r4 + r5 + r6;
    dd+= r7 + r8;
    return dd;
}

int main() {
    int s = func(1,2,3,4,5,6,7, 8,9);
    return s;
}

And here's the assembly output of the func:

func(int, int, int, int, int, int, int, int, int):
  push rbp
  mov rbp, rsp
  mov DWORD PTR [rbp-20], edi
  mov DWORD PTR [rbp-24], esi
  mov DWORD PTR [rbp-28], edx
  mov DWORD PTR [rbp-32], ecx
  mov DWORD PTR [rbp-36], r8d
  mov DWORD PTR [rbp-40], r9d
  mov DWORD PTR [rbp-4], 23
  mov edx, DWORD PTR [rbp-20]
  mov eax, DWORD PTR [rbp-24]
  add edx, eax
  mov eax, DWORD PTR [rbp-28]
  add edx, eax
  mov eax, DWORD PTR [rbp-32]
  add edx, eax
  mov eax, DWORD PTR [rbp-36]
  add edx, eax
  mov eax, DWORD PTR [rbp-40]
  add edx, eax
  mov eax, DWORD PTR [rbp+16]
  add eax, edx
  mov DWORD PTR [rbp-8], eax
  mov edx, DWORD PTR [rbp+24]
  mov eax, DWORD PTR [rbp+32]
  add eax, edx
  add DWORD PTR [rbp-8], eax
  mov eax, DWORD PTR [rbp-8]
  pop rbp
  ret

And here's the link for those who like it more interactive: https://godbolt.org/g/zM8BMN

If you look at the assembly, all the arguments passed to the function which are not on the stack are allocated a space started from rbp-20 even though it could easily have been rbp-12. As far as I can see, the stack range : [rbp-16]..[rbp-8] is unused. So why compiler didn't use it?

pranavk
  • 1,774
  • 3
  • 17
  • 25
  • Your compiler is just trying to align to the stack boundary. If I remember correctly, by default it tries to maintain a 16-byte alignment. – Yashas Dec 10 '17 at 17:41
  • What command did you use to compile ? – Tony Tannous Dec 10 '17 at 17:42
  • @Yashas There is a 16 byte empty already that it's not using. I don't understand how not using that space is alignment. I mean what difference it is to the compiler from alignment point of view if it starts from rbp-20 or rbp-12 – pranavk Dec 10 '17 at 17:44
  • @TonyTannous You can check the godbolt link I pasted at the end of the question. There is no optimization turned on. – pranavk Dec 10 '17 at 17:44
  • 1
    On a side note: Build your code with optimisations on `-O3` to get a better idea of what the code would look like without all the extra load stores. So the compiler doesn't inline your function (to see it generate an actual call from `main`) you can add `__attribute((noinline))` to the `int func` definition (ie `int __attribute((noinline)) func` . Often the how and why (beyond the alignment requirements the compiler may place things on the stack is an internal implementation detail. – Michael Petch Dec 10 '17 at 17:48
  • 1
    @pranavk: un-optimized code is ... not optimized. For performance or for stack consumption. – Peter Cordes Dec 10 '17 at 17:53
  • It makes sense. I can't understand where you got the 8 bytes from. `push 9 ; rsp - 4 push 8 ; rsp - 8 push 7 ; rsp - 12` and `rsp - 16` for the return address. Inside the function, the compiler continues from `rbp - 20` which makes perfect sense. – Yashas Dec 10 '17 at 17:57
  • @Yashas: The question is about packing the red-zone more tightly inside `func`, not about how main passes stack and register args. gcc uses `dword [rbp-4]` and `dword [rbp-8]` for locals, but spills the register args to `[rbp-20]` .. `[rbp-40]`, leaving 12 and 16 unused. – Peter Cordes Dec 10 '17 at 18:03
  • If I had to guess, maybe internally gcc was planning to `sub rsp, something` to reach a 16-byte alignment boundary, and put locals at the top, spilled args at the bottom. But then after realizing it was a leaf function so it could use the red-zone, it didn't go back and re-pack its locals. – Peter Cordes Dec 10 '17 at 18:06
  • @Yashas Sorry, I meant there's an 8 byte empty space there, not 16. But I still don't understand your alignment logic though. – pranavk Dec 10 '17 at 18:09
  • My comments were anyway not relavent to your question as I was thinking about the pushing of arguments and the function call whereas your actual doubt was about something inside the function call. – Yashas Dec 10 '17 at 18:10
  • 1
    @pranavk: With `-O3` (but forcing it to spill using `volatile`), we do get tight packing (use of every dword from `[rbp-4]` down to `[rbp-44]`: https://godbolt.org/g/UtoBwN. With and without `-fno-omit-frame-pointer`. So it's purely an artefact of compiling with `-O0` and seeing the effects of gcc's internal representations, like [an answer](https://stackoverflow.com/a/37777520/224132) on the duplicate question says. – Peter Cordes Dec 10 '17 at 18:15
  • There are examples of GCC over-allocating stack space even with `-O3`, e.g. [Why does GCC allocate more space than necessary on the stack?](https://stackoverflow.com/q/63009070) – Peter Cordes Jul 21 '20 at 08:15

0 Answers0