14

Consider the following example:

struct vector {
    int  size() const;
    bool empty() const;
};

bool vector::empty() const
{
    return size() == 0;
}

The generated assembly code for vector::empty (by clang, with optimizations):

push    rax
call    vector::size() const
test    eax, eax
sete    al
pop     rcx
ret

Why does it allocate stack space? It is not used at all. The push and pop could be omitted. Optimized builds of MSVC and gcc also use stack space for this function (see on godbolt), so there must be a reason.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Dr. Gut
  • 2,053
  • 7
  • 26
  • 7
    Did you account for the implicit `this` parameter? – dan04 Jan 07 '20 at 22:26
  • @dan04: I did. `this` is passed in `rcx` register in MSVC and in `rdi` in clang and gcc. See this [example](https://godbolt.org/z/9kjVMG). It does not need stack. – Dr. Gut Jan 07 '20 at 22:35
  • Have you tried with an actually defined `vector::size()` function? – Bob__ Jan 07 '20 at 22:40
  • 1
    @Bob__: No. Why should I? `vector::size()` is not defined in the example to simuate that it is not inlined. – Dr. Gut Jan 07 '20 at 22:42
  • 1
    So, how could a compiler optimize something it doesn't know? – Bob__ Jan 07 '20 at 22:44
  • 1
    @Bob__: I think, that knowing the implementation of `vector::size()` is not relevant for allocating or not allocating a stack frame for `vector::empty()`. In `empty()` it is just called, whatever it is. – Dr. Gut Jan 07 '20 at 22:48
  • 1
    Well, you are calling a function that *returns* something, you need space for that (if you don't know any better). – Bob__ Jan 07 '20 at 23:05
  • 1
    @Bob__: Return value is in `eax`. No stack needed for that. – Fred Larson Jan 07 '20 at 23:15
  • 1
    The `push` and `pop` could not be omitted, because they don't use the same register. – Mark Ransom Jan 07 '20 at 23:20
  • @MarkRansom: So any idea why they are there? They don't seem to accomplish much, except to preserve the value of `rax` but move it to `rcx`. – Fred Larson Jan 07 '20 at 23:28
  • 1
    @FredLarson: these are just "dummy" stack aligment instructions. They are shorter than the equivalent `add rsp`-like instructions. Register doesn't matter here. For `pop`, `rcx` is used, because it can be trashed. – geza Jan 07 '20 at 23:30
  • https://stackoverflow.com/questions/2362097/why-is-the-size-of-an-empty-class-in-c-not-zero – parktomatomi Jan 08 '20 at 02:51

1 Answers1

12

It allocates stack space, so the stack is 16-byte aligned. It is needed, because the return address takes 8 bytes, so an additional 8-byte space is needed to keep the stack 16-byte aligned.

The alignment of stack frames can be configured with command line arguments for some compilers.

  • MSVC: The documentation says that the stack is always 16-byte aligned. No command line argument can change this. The godbolt example shows that 40 bytes are subtracted from rsp at the beginning of the function, which means that something else also affects this.
  • clang: The -mstack-alignment option specifies the stack alignment. It seems, that the default is 16, although not documented. If you set it to 8, the stack allocation (push and pop) disappears from the generated assembly code.
  • gcc: The -mpreferred-stack-boundary option specifies the stack alignment. If the given value is N, it means 2^N bytes of alignment. The default value is 4, which means 16 bytes. If you set it to 3 (i.e. 8 bytes), the stack allocation (sub and add for rsp) disappears from the generated assembly code.

Check out on godbolt.

Dr. Gut
  • 2,053
  • 7
  • 26
geza
  • 28,403
  • 6
  • 61
  • 135
  • That's why c++ gurus, experts have always been warning: put struct/class members in order of the longest/biggest size to smallest... only this way it'd be correctly efficient –  Jan 08 '20 at 00:00
  • @geza: Thank you. I did some research for the other two compilers, and written it to your answer. Do you like it? – Dr. Gut Jan 08 '20 at 18:40
  • 2
    @Dr.Gut: thanks, you made the answer much better and complete. Note, that stack alignment is usually documented in the ABI for the system (for example, for some systems, here are the documents: https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI). – geza Jan 08 '20 at 19:46
  • @geza: Thank you. – Dr. Gut Jan 08 '20 at 21:39