5

I have a simple C program. Let's say, for example, I have an int and a char array of length 20. I need 24 bytes in total.

int main()
{
   char buffer[20];
   int x = 0;
   buffer[0] = 'a';
   buffer[19] = 'a';
}

The stack needs to be aligned to a 16 bytes boundary, so I presume a compiler will reserve 32 bytes. But when I compile such a program with gcc x86-64 and read the output assembly, the compiler reserves 64 bytes.

..\gcc -S -o main.s main.c

Gives me:

    .file   "main.c"
    .def    __main; .scl    2;  .type   32; .endef
    .text
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp                        # RBP is pushed, so no need to reserve more for it
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $64, %rsp                   # Reserving the 64 bytes
    .seh_stackalloc 64
    .seh_endprologue
    call    __main
    movl    $0, -4(%rbp)                # Using the first 4 bytes to store the int
    movb    $97, -32(%rbp)              # Using from RBP-32 
    movb    $97, -13(%rbp)              # to RBP-13 to store the char array
    movl    $0, %eax
    addq    $64, %rsp                   # Restoring the stack with the last 32 bytes unused
    popq    %rbp
    ret
    .seh_endproc
    .ident  "GCC: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 5.2.0"

Why is that? When I program assembly, I always reserve only the minimum memory I need without any problem. Is that a limitation of the compiler which has trouble evaluating the needed memory or is there a reason for that?

Here is gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=D:/Mingw64/bin/../libexec/gcc/x86_64-w64-mingw32/5.2.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../../../src/gcc-5.2.0/configure --host=x86_64-w64-mingw32 --build=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 --prefix=/mingw64 --with-sysroot=/c/mingw520/x86_64-520-posix-seh-rt_v4-rev0/mingw64 --with-gxx-include-dir=/mingw64/x86_64-w64-mingw32/include/c++ --enable-shared --enable-static --disable-multilib --enable-languages=c,c++,fortran,objc,obj-c++,lto --enable-libstdcxx-time=yes --enable-threads=posix --enable-libgomp --enable-libatomic --enable-lto --enable-graphite --enable-checking=release --enable-fully-dynamic-string --enable-version-specific-runtime-libs --disable-isl-version-check --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-bootstrap --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers --with-gnu-as --with-gnu-ld --with-arch=nocona --with-tune=core2 --with-libiconv --with-system-zlib --with-gmp=/c/mingw520/prerequisites/x86_64-w64-mingw32-static --with-mpfr=/c/mingw520/prerequisites/x86_64-w64-mingw32-static --with-mpc=/c/mingw520/prerequisites/x86_64-w64-mingw32-static --with-isl=/c/mingw520/prerequisites/x86_64-w64-mingw32-static --with-pkgversion='x86_64-posix-seh-rev0, Built by MinGW-W64 project' --with-bugurl=http://sourceforge.net/projects/mingw-w64 CFLAGS='-O2 -pipe -I/c/mingw520/x86_64-520-posix-seh-rt_v4-rev0/mingw64/opt/include -I/c/mingw520/prerequisites/x86_64-zlib-static/include -I/c/mingw520/prerequisites/x86_64-w64-mingw32-static/include' CXXFLAGS='-O2 -pipe -I/c/mingw520/x86_64-520-posix-seh-rt_v4-rev0/mingw64/opt/include -I/c/mingw520/prerequisites/x86_64-zlib-static/include -I/c/mingw520/prerequisites/x86_64-w64-mingw32-static/include' CPPFLAGS= LDFLAGS='-pipe -L/c/mingw520/x86_64-520-posix-seh-rt_v4-rev0/mingw64/opt/lib -L/c/mingw520/prerequisites/x86_64-zlib-static/lib -L/c/mingw520/prerequisites/x86_64-w64-mingw32-static/lib '
Thread model: posix
gcc version 5.2.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Henri Latreille
  • 263
  • 1
  • 11
  • Indeed compiler uses extra memory and always there are overheads comparing the generated code with pure written assembly codes. But trust compilers, they manage memory better than every assembly programmer and those tiny overheads will not count and harm your memory. – masoud Jun 12 '16 at 04:48
  • 5
    It may need space for other things, like register spillage, temporary values from complex expressions, exception handling markers, and buffer overflow mitigation canaries. – Raymond Chen Jun 12 '16 at 04:51
  • 1
    @deepmax "Indeed compiler uses extra memory" Impossible to say without an [MCVE]. This simple [program](http://coliru.stacked-crooked.com/a/311384a9591212f5) shows `.comm arr,80,32` which looks like a 32 byte boundary to me... – uh oh somebody needs a pupper Jun 12 '16 at 04:52
  • @sleeptightpupper: I said that generally, the point was, using high-level languages (at least higher level than assembly), you will have some overheads in the code. Of course, there are many examples that shows C can produce assembly code optimized and efficient. – masoud Jun 12 '16 at 04:56
  • @deepmax The overhead shouldn't be present in the actual executable code, though. That would undoubtably violate the x86_64 ABI somehow. – uh oh somebody needs a pupper Jun 12 '16 at 04:57
  • @sleeptightpupper: and your comment doesn't violate what I've said. – masoud Jun 12 '16 at 05:01
  • Please include what compiler, settings, and version that you are using. [This sample code that I used with GCC 5.3 x86-64 give me a version with 24-byte alignment.](https://godbolt.org/g/jDWgU7) – CinchBlue Jun 12 '16 at 05:01
  • I must also mention that the amount of bytes reserved is also dependent on the size of the returned object from your function. [The amount of bytes reserved for the stack is also dependent on the size of the returned object, according to this example.](https://godbolt.org/g/9pPh6t) – CinchBlue Jun 12 '16 at 05:10
  • This question has been asked many times at SO ... did you search before posting? – Jim Balter Jun 12 '16 at 07:00
  • @deepmax "*always* there are overheads comparing the generated code with pure written assembly codes" -- this is not true. "at least higher level than assembly" -- At a higher level than C, perhaps. – Jim Balter Jun 12 '16 at 07:02
  • "when I ... read the output assembly, the compiler reserves 64 bytes" -- did you read the code to see what it used the stack space for? – Jim Balter Jun 12 '16 at 07:08
  • 1
    @sleeptightpupper The amount of stack space allocated has no bearing on the ABI, and v.v. – Jim Balter Jun 12 '16 at 07:11
  • Without seeing the assembly code, we'd only be guessing. – user3344003 Jun 12 '16 at 13:13
  • I started programming in 6502 assembly on a 32K machine. These extra bytes then matter - but in a modern age a few extra bytes do not – Ed Heal Jun 12 '16 at 13:39
  • Did you try an optimized build? – Daniel Jun 12 '16 at 13:46
  • @Dani Yes. Then it allocates 40 bytes. – Henri Latreille Jun 12 '16 at 14:50
  • @EdHeal No doubt. I was just curious. – Henri Latreille Jun 12 '16 at 14:51
  • What is the `call __main`? – Daniel Jun 12 '16 at 14:58
  • @Dani Good question. It seems to be the initialisation of collect2 utility which allows linking of C and C++ code together. https://gcc.gnu.org/onlinedocs/gccint/Collect2.html#Collect2 – Henri Latreille Jun 12 '16 at 15:43

3 Answers3

6

Compilers may indeed reserve additional memory for themselves.

Gcc has a flag, -mpreferred-stack-boundary, to set the alignment it will maintain. According to the documentation, the default is 4, which should produce 16-byte alignment, which needed for SSE instructions.

As VermillionAzure noted in a comment, you should provide your gcc version and compile-time options (use gcc -v to show these).

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • This is what actually worked for me when I disassembled different versions of compiled c code. Changing the -mpreferred-stack-boundary was the option that changed the space allocated for local variables. – Filipe Rodrigues Oct 05 '17 at 18:51
  • 1
    Note that setting `-mpreferred-stack-boundary` less than 4 will break ABIs such as i386 System V on Linux, or x86-64 System V everywhere, that *require* 16-byte stack alignment before a `call`, and will generate alignment-required SIMD load/store to stack memory that will fault if this ABI guarantee is violated. e.g. [glibc scanf Segmentation faults when called from a function that doesn't align RSP](https://stackoverflow.com/q/51070716) – Peter Cordes Jul 20 '20 at 07:13
  • Also, GCC *does* have a missed-optimization bug that sometimes results in allocating an extra 16 bytes beyond what's needed for alignment + locals, even at `-O3`: [Why does GCC allocate more space than necessary on the stack?](https://stackoverflow.com/q/63009070) demonstrates it. – Peter Cordes Jul 21 '20 at 08:20
4

Because you haven't enabled optimization.

Without optimization, the compiler makes no attempt to minimize the amount of space or time it needs for anything in the generated code -- it just generates code in the most straight-forward way possible.

Add -O2 (or even just -O1) or -Os if you want the compiler to produce decent code.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
-3

I need 24 bytes in total.

The compiler needs space for a return address and a base pointer. As you are in 64 bit mode, that's another 16 bytes. Total 40. Round that up to a 32-byte boundary and you get 64.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 2
    The base pointer and return address are allocated on the stack separately (by the `pushq %rbp` and `call` instructions respectively), so are not included in the `subq $64, %rsp`. – interjay Jun 12 '16 at 13:42