Why does GCC allocate "too much" stack space for local C struct?

Question

I've written a code snippet that stores a C++ object inside a char array wrapped in a struct, with the intent of passing it to a function written in C:

class cpp_type
{
public:  
  // ...

private:
  int _state;
};

struct c_type
{ char buf[4]; };

struct c_type make_c_type()
{
  c_type tmp;
  new (tmp.buf) cpp_type();
  return tmp;
}

Then I was curious how the compiler translates make_c_type to x86_64 assembly code. GCC (with optimizations) produces:

0000000000000000 <_Z11make_c_typev>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   48 8d 7c 24 0c          lea    0xc(%rsp),%rdi
   9:   e8 00 00 00 00          call   e <_Z11make_c_typev+0xe>
   e:   8b 44 24 0c             mov    0xc(%rsp),%eax
  12:   48 83 c4 18             add    $0x18,%rsp
  16:   c3                      ret

I'm confused as to why the stack pointer is decremented by 24 bytes and rdi set to rsp + 12. The structure itself is only 4 bytes large, why doesn't GCC output e.g. sub $0x4,%rsp followed by mov %rsp,%rdi? Furthermore, shouldn't the stack always remain 16 byte aligned anyways?

The ABI requires the stack to be aligned on 16B before the call, in the prologue it is misaligned by 8B due to the caller return address, so at least 0x8 is required. I think GCC could have used 0x08 but probably the internal algorithm just allocate the space needed to align the stack (0x08) plus the space required by the locals rounded to the next multiple of 16 (in this case 0x10). So 0x08+0x10 = 0x18. As for why 0xc, it may simply be 0x18-0xc = 0xc which is 0x8 (from alignment) + 0x4 (from the actual space of the local). — Margaret Bloom, Jun 29 '21 at 11:45
Can't reproduce. Please provide a [mcve], preferably with a godbolt link. — rustyx, Jun 29 '21 at 11:58
@MargaretBloom: That makes a lot of sense, if you change that into an answer I'll accept it. But isn't the stack 16 byte aligned on entry and then needs to be aligned on 16 + 8 before the function call? — Peter, Jun 29 '21 at 12:01
It must be aligned on 16B before the call, the ABI goes: "The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point." — Margaret Bloom, Jun 29 '21 at 12:16
@MarekR You are looking at a `objdump` output, the code is not relocated yet. It's common for calls to have a zero immediate. — Margaret Bloom, Jun 29 '21 at 12:17

Why does GCC allocate "too much" stack space for local C struct?

0 Answers0