140

So I'm trying to learn a little bit of assembly, because I need it for Computer Architecture class. I wrote a few programs, like printing the Fibonacci sequence.

I recognized that whenever I write a function I use those 3 lines (as I learned from comparing assembly code generated from gcc to its C equivalent):

pushq   %rbp
movq    %rsp, %rbp
subq    $16, %rsp

I have 2 questions about it:

  1. First of all, why do I need to use %rbp? Isn't it simpler to use %rsp, as its contents are moved to %rbp on the 2nd line?
  2. Why do I have to subtract anything from %rsp? I mean it's not always 16, when I was printfing like 7 or 8 variables, then I would subtract 24 or 28.

I use Manjaro 64 bit on a Virtual Machine (4 GB RAM), Intel 64 bit processor

ecm
  • 2,583
  • 4
  • 21
  • 29
  • 1
    You forgot to enable optimization. As for the amount to subtract that depends on alignment requirements and whether you can use the red zone. – Jester Jan 28 '17 at 17:44
  • @Jester Enabling optimization doesn't necessarily mean that frame pointer omission will be enabled too – Govind Parmar Jan 28 '17 at 17:49
  • 8
    Possible duplicate of [What is exactly the base pointer and stack pointer? To what do they point?](http://stackoverflow.com/questions/1395591/what-is-exactly-the-base-pointer-and-stack-pointer-to-what-do-they-point). IOW it's the same as in x86_32 code. – Jongware Jan 28 '17 at 17:50
  • 1
    @GovindParmar depends on compiler, but you yourself guessed gcc, where it does. Also, subtracting from rsp for no reason (which is hinted by OP) also says no optimization. – Jester Jan 28 '17 at 17:53
  • 1
    Possible duplicate of [What is the purpose of the EBP frame pointer register?](https://stackoverflow.com/questions/579262/what-is-the-purpose-of-the-ebp-frame-pointer-register) – phuclv Jun 11 '18 at 09:39
  • [Phoronix tested](https://www.phoronix.com/scan.php?page=article&item=fedora-frame-pointer&num=1) the performance downside of `-O2 -fno-omit-frame-pointer` with x86-64 GCC12.1 on a Zen3 laptop CPU for multiple open-source programs, as proposed for Fedora 37. Most of them had performance regressions, a few of them very serious, although the biggest ones are probably some kind of fluke or other interaction. **Geometric mean 14% faster without frame pointers.** (influenced by some really big slowdowns in a couple programs.) – Peter Cordes Jul 02 '22 at 07:14

2 Answers2

148

rbp is the frame pointer on x86_64. In your generated code, it gets a snapshot of the stack pointer (rsp) so that when adjustments are made to rsp (i.e. reserving space for local variables or pushing values on to the stack), local variables and function parameters are still accessible from a constant offset from rbp.

A lot of compilers offer frame pointer omission as an optimization option; this will make the generated assembly code access variables relative to rsp instead and free up rbp as another general purpose register for use in functions.

In the case of GCC, which I'm guessing you're using from the AT&T assembler syntax, that switch is -fomit-frame-pointer. Try compiling your code with that switch and see what assembly code you get. You will probably notice that when accessing values relative to rsp instead of rbp, the offset from the pointer varies throughout the function.

Govind Parmar
  • 20,656
  • 7
  • 53
  • 85
  • 1
    Exactly, it uses `%rsp`!. But it f.e when I `movl` first constant, f.e `movl $10` it moves it to `4(%rsp)`. Why not `-4`? And btw, why it still subtracts some value from `%rsp`? I didn't understand it through comments –  Jan 28 '17 at 18:43
  • 2
    @FrynioS Compiler allocates some space for local values on function enter. That's why it subtracts value from %rsp on enter. This doesn't depend on whether %rbp is used as frame pointer. After that, this place is used with positive offsets upon %rsp. Also, if this function calls another one, %rsp shall be aligned on 16-byte boundary for each call, so, in that case compiler shall subtract 8 from %rsp on each enter. – Netch Jan 29 '17 at 05:52
  • 1
    @FrynioS notice also there is a 128-byte space ("red zone") before %rsp that keeps its contents between function calls but preserved by OS during interrupts. So, very temporary values (between function calls) can be used with negative offsets to %rsp. Not all compilers utilize this. – Netch Jan 29 '17 at 05:55
112

Linux uses the System V ABI for x86-64 (AMD64) architecture; see System V ABI at OSDev Wiki for details.

This means the stack grows down; smaller addresses are "higher up" in the stack. Typical C functions are compiled to

        pushq   %rbp        ; Save address of previous stack frame
        movq    %rsp, %rbp  ; Address of current stack frame
        subq    $16, %rsp   ; Reserve 16 bytes for local variables

        ; ... function ...

        movq    %rbp, %rsp  ; \ equivalent to the
        popq    %rbp        ; / 'leave' instruction
        ret

The amount of memory reserved for the local variables is always a multiple of 16 bytes, to keep the stack aligned to 16 bytes. If no stack space is needed for local variables, there is no subq $16, %rsp or similar instruction.

(Note that the return address and the previous %rbp pushed to the stack are both 8 bytes in size, 16 bytes in total.)

While %rbp points to the current stack frame, %rsp points to the top of the stack. Because the compiler knows the difference between %rbp and %rsp at any point within the function, it is free to use either one as the base for the local variables.

A stack frame is just the local function's playground: the region of stack the current function uses.

Current versions of GCC disable the stack frame whenever optimizations are used. This makes sense, because for programs written in C, the stack frames are most useful for debugging, but not much else. (You can use e.g. -O2 -fno-omit-frame-pointer to keep stack frames while enabling optimizations otherwise, however.)

Although the same ABI applies to all binaries, no matter what language they are written in, certain other languages do need stack frames for "unwinding" (for example, to "throw exceptions" to an ancestor caller of the current function); i.e. to "unwind" stack frames that one or more functions can be aborted and control passed to some ancestor function, without leaving unneeded stuff on the stack.

When stack frames are omitted -- -fomit-frame-pointer for GCC --, the function implementation changes essentially to

        subq    $8, %rsp    ; Re-align stack frame, and
                            ; reserve memory for local variables

        ; ... function ...

        addq    $8, %rsp
        ret

Because there is no stack frame (%rbp is used for other purposes, and its value is never pushed to stack), each function call pushes only the return address to the stack, which is an 8-byte quantity, so we need to subtract 8 from %rsp to keep it a multiple of 16. (In general, the value subtracted from and added to %rsp is an odd multiple of 8.)

Function parameters are typically passed in registers. See the ABI link at the beginning of this answer for details, but in short, integral types and pointers are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9, with floating-point arguments in the %xmm0 to %xmm7 registers.

In some cases you'll see rep ret instead of rep. Don't be confused: the rep ret means the exact same thing as ret; the rep prefix, although normally used with string instructions (repeated instructions), does nothing when applied to the ret instruction. It's just that certain AMD processors' branch predictors don't like jumping to a ret instruction, and the recommended workaround is to use a rep ret there instead.

Finally, I've omitted the red zone above the top of the stack (the 128 bytes at addresses less than %rsp). This is because it is not really useful for typical functions: In the normal have-stack-frame case, you'll want your local stuff to be within the stack frame, to make debugging possible. In the omit-stack-frame case, stack alignment requirements already mean we need to subtract 8 from %rsp, so including the memory needed by the local variables in that subtraction costs nothing.

VelocityRa
  • 156
  • 1
  • 9
Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
  • @NominalAnimal please fix typo: correct option is `-fomit-frame-pointer`. – Netch Jan 29 '17 at 05:58
  • @Netch: D'Oh! Good catch; thanks for the heads up. Fixed now. – Nominal Animal Jan 29 '17 at 13:12
  • @FrynioS you can change the accepted answer if you feel another one is better. – Jester Jan 29 '17 at 13:34
  • The red-zone is not an obstacle for debugging, and gcc doesn't avoid it in leaf functions even with `-O0 -g` (https://godbolt.org/g/kCpCnJ). Notice that it makes a stack frame, but never does a `sub rsp` so the store / reload to `[rbp-4]` is to memory only reserved by the red-zone, not by being above RSP. Modern debuggers are non-intrusive and don't clobber space below `%rsp` in the target process being debugged. (This may have been different under DOS or other non-multi-tasking OSes.) – Peter Cordes Jan 02 '18 at 21:46
  • Also, the SysV ABI *does* actually require unwind info in the `.eh_frame` section, even for C code. (Throwing exceptions through a call chain that includes some C code is not safe, although the ABI designers seem to want to support it). Stack unwinding doesn't use frame pointers; it uses separate metadata that maps RIP values to what has / hasn't been pushed/popped (and where the return address is). So anyway, `-fomit-frame-pointer` is on by default even for `g++` with exceptions enabled. This answer is otherwise pretty good, though. – Peter Cordes Jan 02 '18 at 21:50
  • @PeterCordes: True, local data does not need to be in the stack frame to be debuggable, and debuggers handle the red zone just fine. I was more referring to humans reading the disassembly; weird stack offsets always bother me. DWARF unwinding (using `.eh_frame`) does not use frame pointers, true, but a program (or a library) can still use the frame pointers to examine its stack frames; so, I'm not sure if *"stack unwinding doesn't use frame pointers"* is precisely true. If you can edit the answer to fix the wording, please do! – Nominal Animal Jan 03 '18 at 01:19
  • The C++ ABI says stack unwinding for exceptions uses `.eh_frame`, so doing anything else (e.g. legacy RBP as linked-list of frame-pointers) is a non-standard thing that depends on everything being compiled with specific options (like `-fno-omit-frame-pointer`). GDB doesn't even have good support for being told to unwind the stack that way if there's any `.eh_frame` info for any function. You have to manually write a GDB function: https://stackoverflow.com/questions/42739893/force-gdb-to-use-frame-pointer-based-unwinding. I think gcc's `__builtin_return_address` with level>=1 uses DWARF, too – Peter Cordes Jan 03 '18 at 01:31
  • I may get around to editing this answer, but I was hoping not to put in the effort >. – Peter Cordes Jan 03 '18 at 01:32
  • @PeterCordes: Do note that this question is not about C++ at all; it is tagged [tag:c] but the question only talks about assembly. C++ ABI is irrelevant wrt. this question. Also note that if we only accept DWARF stack unwinding as the "only relevant one", then there is no reason to use the `rbp` register in x86 and x86-64 assembly (as `rsp`-relative addressing suffices). DWARF is just the most (sensible and) common option. Anyway, as you know, I'm not a language lawyer, and often me fail English: I find it very difficult to find concise, precise correct wording :(. Any help *is* appreciated. – Nominal Animal Jan 03 '18 at 12:39
  • 1
    While it is true that `eh_frame` makes frame pointers _unnecessary_ it isn't true to say that `eh_frame` doesn't use frame pointers: if the compiler uses frame pointer for a function both `eh_frame` and `debug_frame` will definitely use them. In fact, it is common advice if you want to reduce binary size to add `-fno-omit-frame-pointer`: a some cost in code size/performance you can get dramatically smaller `eh_frame` sections since a frame-pointer is about the simplest way that a stack frame can be found and so the DWARF representation is very succinct. – BeeOnRope Jan 03 '18 at 19:01
  • 'This means the stack grows down; smaller addresses are "higher up" in the stack.' .. Heh, this one sentence can be interpreted a multitude of ways. "smaller vs lower" is an interesting comparison. – RichieHH Jun 21 '21 at 05:32