1

I have the following code:

int main()
{
    volatile char a;

    return 0;
}

When I disassemble it with arm-linux-gnueabihf-gcc -o align.txt -O0 -S align.c I get the following:

push    {r7}
sub sp, sp, #12
...

Now, I know the ARM EABI requires the stack to be 8-byte aligned, which would explain gcc allocating some more space than the single byte required by a. However, I'd have thought it'd allocate 4 bytes to make room for a and 3 padding bytes.

It looks like it's allocating 4 bytes to make the pushed r7 aligned, then another 8 (1 for a and 7 for padding). If I define a 9-member char array instead, it'll allocate 20 bytes.

Why does the pushed r7 need its own padding?

Martin
  • 940
  • 6
  • 26
  • stupid question, a stab at the dark: if you remove the "volatile", and make use of `a` so it doesn't get compiled away, is it the same? – Marcus Müller Dec 09 '15 at 18:36
  • 5
    Note that gcc does a lot of very naive stuff when you do not enable optimizations. – nos Dec 09 '15 at 18:37
  • Yes, you should use `-O3` and then change `return 0;` to `return a;` so that it's not optimized away. You will see 8 bytes allocated. – Jester Dec 09 '15 at 18:39
  • @MarcusMüller Yes. I also tried what Jester suggested, and it just compiles to `movs r0, #0`. gcc seems to get way smarter with optimizations on. – Martin Dec 09 '15 at 18:44
  • what happens with `-Os` ? – Marcus Müller Dec 09 '15 at 18:45
  • Another try: I removed the `volatile` and used `return a`. It'll allocate 8 bytes, but it won't push r7. – Martin Dec 09 '15 at 18:46
  • @MarcusMüller `-Os` does `movs r0, #0`. – Martin Dec 09 '15 at 18:47
  • The compiler is still allowed to remove this `volatile`, as it can guarantee it sees no global scope and you do not even access it. – too honest for this site Dec 09 '15 at 18:50
  • Why do you care about pushing `r7` anyway? If it's not used, it doesn't need to be pushed. You can force it using `-O3 -fno-omit-frame-pointer` and then the 12 bytes come back. Now that's interesting :) – Jester Dec 09 '15 at 19:01
  • @Jester I don't care. The compiler does. I just want to know why, if it chooses to push r7, it also adds 4 extra bytes of padding to it. – Martin Dec 09 '15 at 19:04
  • @Jester I tried -O3 -fno-omit-frame-pointer and it pushes r7, but it doesn't move SP at all. Since the ARM EABI requires the stack to be 8-byte aligned, does this violate the ABI? – Martin Dec 09 '15 at 19:08
  • @Martin no, as a leaf function it doesn't actually matter - the ABI only _requires_ 8-byte stack alignment across a function call. Consider, though, if you weren't optimising (so leaving the stack 8-byte aligned at all times because you don't know if you're going to make a call later), and were to perform the "preserve 4 bytes of callee-saved registers" and "make 1 byte of space for local variables" as completely separate steps... – Notlikethat Dec 09 '15 at 19:23
  • @Notlikethat yeah, that's what I thought as well. The only way it makes sense is if those are separate steps. Unfortunately I don't know much about gcc internals, so I can't check it by myself. – Martin Dec 09 '15 at 19:27
  • The take-away lesson is: never try to reason about _unoptimised_ GCC code, just have a good laugh at such marvels as spilling a register to the stack between statements, immediately reloading that value back into the _same_ register, then never using it ;) – Notlikethat Dec 09 '15 at 19:31
  • The most likely explanation is stack alignment requirements. See [here](http://stackoverflow.com/questions/4175281/what-does-it-mean-to-align-the-stack) – David Hoelzer Dec 09 '15 at 21:43

1 Answers1

1

In C / C++, on a 32/64 bit machine, 8 bit or 16 bit parameters are passed as 32 bit parameters (maybe 64 bit on some 64 bit machines). The called function then only uses the lower 8 or 16 bits of the 32 bit parameters. Some (or most) 32/64 bit processors don't have or use a push byte or push short for stack operations. If using a fast call API, then 32 bit or 64 bit registers will be used for the first few parameters.

The example assembly code may be aligning the stack to a 16 byte boundary, or it may be doing a default allocation for local variables or debugger use of the stack.

rcgldr
  • 27,407
  • 3
  • 36
  • 61