19

pay attention to this code :

#include <stdio.h>
void a(int a, int b, int c)
{
    char buffer1[5];
    char buffer2[10];
}

int main()
{
    a(1,2,3); 
}

after that :

gcc -S a.c

that command shows our source code in assembly.

now we can see in the main function, we never use "push" command to push the arguments of the a function into the stack. and it used "movel" instead of that

main:
 pushl %ebp
 movl %esp, %ebp
 andl $-16, %esp
 subl $16, %esp
 movl $3, 8(%esp)
 movl $2, 4(%esp)
 movl $1, (%esp)
 call a
 leave

why does it happen? what's difference between them?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Pooya
  • 992
  • 2
  • 10
  • 31

4 Answers4

22

Here is what the gcc manual has to say about it:

-mpush-args
-mno-push-args
    Use PUSH operations to store outgoing parameters. This method is shorter and usually
    equally fast as method using SUB/MOV operations and is enabled by default. 
    In some cases disabling it may improve performance because of improved scheduling
    and reduced dependencies.

 -maccumulate-outgoing-args
    If enabled, the maximum amount of space required for outgoing arguments will be
    computed in the function prologue. This is faster on most modern CPUs because of
    reduced dependencies, improved scheduling and reduced stack usage when preferred
    stack boundary is not equal to 2. The drawback is a notable increase in code size.
    This switch implies -mno-push-args. 

Apparently -maccumulate-outgoing-args is enabled by default, overriding -mpush-args. Explicitly compiling with -mno-accumulate-outgoing-args does revert to the PUSH method, here.


2019 update: modern CPUs have had efficient push/pop since about Pentium M.
-mno-accumulate-outgoing-args (and using push) eventually became the default for -mtune=generic in Jan 2014.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
Jester
  • 56,577
  • 4
  • 81
  • 125
  • 6
    A much better question would be why this bloat-generating option `-maccumulate-outgoing-args` is not automatically disabled by `-Os`. – R.. GitHub STOP HELPING ICE Dec 28 '10 at 04:32
  • @R.. So do you know why? – Tony Mar 25 '15 at 12:49
  • 1
    @Tony: obviously, because when deciding which of the many (~200) optimization flags to enable/disable for each specific -O option, sometimes things slip through the cracks. – ninjalj Jul 27 '15 at 21:06
  • 3
    Update: `-maccumulate-outgoing-args` was disabled for the default [`-mtune=generic`](https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00008.html) in January 2014, now that CPUs without [stack-engines](http://stackoverflow.com/questions/36631576/what-is-the-stack-engine-in-the-sandybridge-microarchitecture) are very uncommon. (It probably should have been done sooner). – Peter Cordes Aug 30 '16 at 23:44
8

That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.

"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.

Ben Zotto
  • 70,108
  • 23
  • 141
  • 204
  • 1
    Why would you need to put the constant into a register first? x86 supports pushing of immediate constants – Necrolis Dec 26 '10 at 19:50
6

gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like x *= n is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, or n is not a constant (and thus using loops with shl-add-sub would come costlier).

In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.

user502515
  • 4,346
  • 24
  • 20
2

Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.

I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607

Ville Krumlinde
  • 7,021
  • 1
  • 33
  • 41
  • 1
    Just to be clear, the OS X ABI only requires the stack pointer be 16-byte aligned at the point of external function calls. – Stephen Canon Dec 26 '10 at 22:19
  • I see, thanks for pointing that out. Reading the other answers I now understand the movl code generation is related to improved scheduling. The andl instruction does seem to only be there for stack alignment though. – Ville Krumlinde Dec 27 '10 at 09:58