Generated assembly for extended alignment of stack variables

Question

I was digging into the assembly of code that was using extended alignment for a stack-based variable. This is a smaller version of the code

struct Something {
    Something();
};

void foo(Something*);

void bar() {
    alignas(128) Something something;
    foo(&something);
}

This, when compiled with clang 8.0 generates the following code (https://godbolt.org/z/lf8WW-)

bar():                                # @bar()
        push    rbp
        mov     rbp, rsp
        and     rsp, -128
        sub     rsp, 128
        mov     rdi, rsp
        call    Something::Something() [complete object constructor]
        mov     rdi, rsp
        call    foo(Something*)
        mov     rsp, rbp
        pop     rbp
        ret

And earlier versions of gcc produce the following (https://godbolt.org/z/LLQ8gW). Starting gcc 8.1, both produce the same code

bar():
        lea     r10, [rsp+8]
        and     rsp, -128
        push    QWORD PTR [r10-8]
        push    rbp
        mov     rbp, rsp
        push    r10
        sub     rsp, 232
        lea     rax, [rbp-240]
        mov     rdi, rax
        call    Something::Something() [complete object constructor]
        lea     rax, [rbp-240]
        mov     rdi, rax
        call    foo(Something*)
        nop
        add     rsp, 232
        pop     r10
        pop     rbp
        lea     rsp, [r10-8]
        ret

I'm not too familiar with x86 and just out of curiosity - what exactly is happening here in both pieces of code? Does the compiler pull tricks like std::align() and round up the current stack position to a multiple of 128 for the on-stack variable something?

Yes, the `and rsp, -128` aligns the stack pointer. It utilizes the fact that the stack grows down on x86. — Jester, Jul 31 '19 at 20:52
Oh that makes sense! -128 is 1...10000000, so it 0s out everything after the 8th 0, leading to the number being a multiple of 128. But why don't earlier versions gcc do the same? — Curious, Jul 31 '19 at 20:59
Both versions of the code you showed have that. The difference seems to be that the older code aligns the stack first then sets up the stack frame, while the newer does it in opposite order. — Jester, Jul 31 '19 at 21:01
@Jester It seems to be using some other constants like 232 and 240 too? Is the responsibility of setting up the stack frame on the callee in the later versions? Why? — Curious, Jul 31 '19 at 21:02
Because it did the alignment first and then placed stuff on the stack it got misaligned again, but to a known amount which can be compensated with constant offsets. — Jester, Jul 31 '19 at 21:03
Oh I see, it knows how much is being pushed on the stack, so it just subtracts that from the stack pointer after the pushes... That's quite simple, thanks! — Curious, Jul 31 '19 at 21:06
Yes, 3 registers of 8 bytes each were pushed after alignment, that is 24 bytes in total. Allocating another 232 brings that up to 256 which is again 128 byte aligned. 240 comes from the 232+8 bytes for the `push r10` which happened after `rbp` was set up. Obviously this code is more convoluted than the newer one, presumably the compiler developers noticed and fixed it :) — Jester, Jul 31 '19 at 21:07

score 2 · Answer 1 · answered Jul 31 '19 at 20:59

2

Nothing magical here. Line-by-line:

bar():                                # @bar()
        push    rbp ; preserve base pointer
        mov     rbp, rsp ; set base poiner
        and     rsp, -128 ; Anding with -128 aligns it on 128 boundary
        sub     rsp, 128 ; incrementing stack grows down, incrementing it gives us the space for new object
        mov     rdi, rsp ; address of the new (future) object is passed as an argument to the constructor, in %RDI
        call    Something::Something() [complete object constructor] # call constructor
        mov     rdi, rsp ; callee might have changed %RDI, so need to restore it
        call    foo(Something*) ; calling a function given it address of fully constructed object
        mov     rsp, rbp ; restore stack pointer
        pop     rbp ; restore base pointer
        ret

answered Jul 31 '19 at 20:59

SergeyA

61,605
5
78
137

Got it, that part makes sense now after reading @Jester's comment. Why don't the earlier versions of gcc do the same? – Curious Jul 31 '19 at 21:00
@Curious I do not see much difference there. Different compilers can use different codegen to arrive to the same observable result. Looks like at some point gcc folks decided that his codegen is better. – SergeyA Jul 31 '19 at 21:03
@Curious: Earlier gcc is copying the return address to make a full fake stack frame which I think is useful for some weird corner cases. Doing that always even when it's useless is a missed optimization. – Peter Cordes Jul 31 '19 at 21:39
@PeterCordes I don't think I follow sorry :( – Curious Jul 31 '19 at 21:40
1

See [@Ross's answer](https://stackoverflow.com/questions/38781118/why-is-gcc-generating-an-extra-return-address/56412251#56412251) on the linked duplicate – Peter Cordes Jul 31 '19 at 21:44

Generated assembly for extended alignment of stack variables

1 Answers1

Linked