7

Given the following minimal test case:

void exit(int);

int main() { 
    exit(0);
}

GCC 4.9 and later with 32-bit x86 target produces something like:

main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4, %esp
        subl    $12, %esp
        pushl   $0
        call    exit

Note the convoluted stack-realignment code. With the function renamed to anything but main, however, it gives the (much more reasonable):

xmain:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        subl    $12, %esp
        pushl   $0
        call    exit

The differences are even more pronounced with -O. As main nothing changes; renamed, it yields:

xmain:
        subl    $24, %esp
        pushl   $0
        call    exit

The above was noticed in answering this question:

How do i get rid of call __x86.get_pc_thunk.ax

Is this behavior (and its motivation) documented anywhere, and is there any way to suppress it? GCC has x86 target-specific options to set the preferred/assumed incoming and outgoing stack alignment and enable/disable realignment for arbitrary functions, but they don't seem to be honored for main.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Sure, you know that :) – Eugene Sh. Apr 30 '18 at 19:07
  • @EugeneSh.: Sorry, "better?" was poorly worded. I should have/wanted to ask: is it good now or are there additional changes that should still be made? – R.. GitHub STOP HELPING ICE Apr 30 '18 at 19:08
  • 1
    Forgive a noob like me, but is it impossible that main has some special code someplace that is executed only for it? The C standard is fairly explicit that main must set provide `(int argc, char **argv)` unless explicitly told not to by declaring it like `int main(void)`. – Roflcopter4 Apr 30 '18 at 19:29
  • @Roflcopter4: That doesn't have anything to do with the question. And in a *definition* (not just declaration), `()` is identical to `(void)`. – R.. GitHub STOP HELPING ICE Apr 30 '18 at 19:33
  • @R.. No need to be rude here. I thought it was relevant. Furthermore, I thought comments needn't be as rigorous as an answer on this website. And finally, you're wrong: in C `int foo()`, whether a prototype or definition, refers to a function with an indeterminate number of parameters of unspecified type. – Roflcopter4 Apr 30 '18 at 19:54
  • 2
    @Roflcopter4 Officially, `T foo()` as the head of a definition _does_ specify that the function takes no parameters, see [N1570 §6.7.6.3p4](http://port70.net/~nsz/c/c11/n1570.html#6.7.6.3p14). I haven't yet found a compiler that actually enforces that rule, though. – zwol Apr 30 '18 at 20:44

2 Answers2

6

This answer is based on source diving. I do not know what the developers' intentions or motivations were. All of the code involved seems to date to 2008ish, which is after my own time working on GCC, but long enough ago that people's memories have probably gotten fuzzy. (GCC 4.9 was released in 2014; did you go back any farther than that? If I'm right about when this code was introduced, the clumsy stack alignment for main should start happening in version 4.4.)

GCC's x86 back end appears to have been coded to make extra-conservative assumptions about the stack alignment on entry to main, regardless of command-line options. The function ix86_minimum_incoming_stack_boundary is called to compute the expected stack alignment on entry for each function, and the last thing it does ...

12523   /* Stack at entrance of main is aligned by runtime.  We use the
12524      smallest incoming stack boundary. */
12525   if (incoming_stack_boundary > MAIN_STACK_BOUNDARY
12526       && DECL_NAME (current_function_decl)
12527       && MAIN_NAME_P (DECL_NAME (current_function_decl))
12528       && DECL_FILE_SCOPE_P (current_function_decl))
12529     incoming_stack_boundary = MAIN_STACK_BOUNDARY;
12530 
12531   return incoming_stack_boundary;

... is override the expected stack alignment to a conservative constant, MAIN_STACK_BOUNDARY, if the function being compiled is main. MAIN_STACK_BOUNDARY is 128 (bits) when compiling 64-bit code and 32 when compiling 32-bit code. As far as I can tell, there is no command-line knob that will make it expect the stack to be more aligned than that on entry to main. I can persuade it to skip stack alignment for main by telling it that no additional alignment is needed, compiling your test program with -m32 -mpreferred-stack-boundary=2 gives me

main:
        pushl   $0
        call    exit

with GCC 7.3.


The write-only manipulations of %ecx appear to be a missed-optimization bug. They are coming from this part of ix86_expand_prologue:

13695       /* Grab the argument pointer.  */
13696       t = plus_constant (Pmode, stack_pointer_rtx, m->fs.sp_offset);
13697       insn = emit_insn (gen_rtx_SET (crtl->drap_reg, t));
13698       RTX_FRAME_RELATED_P (insn) = 1;
13699       m->fs.cfa_reg = crtl->drap_reg;
13700       m->fs.cfa_offset = 0;
13701
13702       /* Align the stack.  */
13703       insn = emit_insn (ix86_gen_andsp (stack_pointer_rtx,
13704                                         stack_pointer_rtx,
13705                                         GEN_INT (-align_bytes)));
13706       RTX_FRAME_RELATED_P (insn) = 1;
13707 

The intention is to save a pointer to the incoming argument area before realigning the stack, so that it is straightforward to access arguments. Either because this happens fairly late in the pipeline (after register allocation), or because the instructions are marked FRAME_RELATED, nothing manages to delete those instructions again when they turn out to be unnecessary.

I imagine the GCC devs would at least listen to a bug report about this, but they might reasonably consider it low priority, because these are instructions that are executed only once in the lifetime of the whole program, they're only actually dead when main doesn't use its arguments, and they only happen in the traditional 32-bit ABI, which I have the impression is considered a second-class target nowadays.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • This doesn't seem to address why it's specific to `main` though. The same prologue generation function is called for all functions. – R.. GitHub STOP HELPING ICE Apr 30 '18 at 19:06
  • 1
    @R.. I'm looking into that now. – zwol Apr 30 '18 at 19:06
  • @R.. Edited with additional information about why `main` is different. – zwol Apr 30 '18 at 19:33
  • I guess going further with this requires pulling a GCC history and going a blame on the line where `MAIN_STACK_BOUNDARY` is defined... – R.. GitHub STOP HELPING ICE Apr 30 '18 at 19:36
  • 2
    @R.. I happen to have a GCC checkout on this computer ... the revision history isn't terribly informative in itself but with a little digging I found https://gcc.gnu.org/ml/gcc-patches/2008-04/msg00349.html ... it sounds like this was a fairly major change to x86 prologue generation. – zwol Apr 30 '18 at 19:50
  • *because these are instructions that are executed only once in the lifetime of the whole program* Unfortunately not true. **Any function that spills/reloads a `__m256` to the stack gets the full dose of this crap, even with `-m64`: https://godbolt.org/g/xSQ16n**. (Or with `alignof(32) int foo[16]` or w/e.) Some functions with `__m256` locals get leftover fragments of it, like just an `lea r10, [rsp+8]` or something, when it turns out all the 256-bit vectors fit in registers. (gcc7 may have mostly fixed that last bit, I forget if I've seen stray bits more recently than gcc5 or 6.) – Peter Cordes May 01 '18 at 03:28
  • And gcc isn't good at avoiding alignment by finishing with aligned locals before non-inline function calls, including when auto-vectorizing. See the above Godbolt link. – Peter Cordes May 01 '18 at 03:32
  • Found https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81274 where gcc8.0.0 (trunk 20170701) emitted an orphan `leal 8(%esp), %ecx` at `-O3` in a simple `__m256 foo(const float *x)` that just loads + uses the pointed-to data. (IIRC, I encouraged @CodyGray to file that). That stray LEA bug seems to be fixed in current trunk. – Peter Cordes May 01 '18 at 03:46
0
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4, %esp

The above section replicates the invoking stack frame, which since you haven’t defined any arguments to main() consists of just the return address -4(%ecx) and frame pointer, into a $16 byte aligned stack; thus my WAG is that this is to accomodate runtimes (crt0.s) that do not align the stack properly.

The push %ebp was a bit of a giveaway -- it establishes a consistent looking backtrace through crt0.s despite this trampoline.

This is just a ‘normal’ call of exit, with the stack properly aligned...

subl    $12, %esp
pushl   $0
call    exit
mevets
  • 10,070
  • 1
  • 21
  • 33
  • This does not answer the question. – R.. GitHub STOP HELPING ICE Apr 30 '18 at 19:33
  • It's not a trampoline, it's just an extra copy of the return address right above the aligned frame pointer. (Done in a clumsy and inefficient way with `lea` instead of `mov %esp, %ecx`). But yes it's pretty clear that gcc is doing this for some kind of EBP / RBP-based backtrace reason. I forget what it does if there are stack args; I think it ties up another register to reference them if it doesn't copy them down into the stack frame (with optimization enabled). gcc uses the same sequence if it needs the stack 32-byte aligned or more in functions other than `main`, including for `-m64`. – Peter Cordes May 01 '18 at 02:47