34

Is there any substantial optimization when omitting the frame pointer? If I have understood correctly by reading this page, -fomit-frame-pointer is used when we want to avoid saving, setting up and restoring frame pointers.

Is this done only for each function call and if so, is it really worth to avoid a few instructions for every function? Isn't it trivial for an optimization. What are the actual implications of using this option apart from the debugging limitations?

I compiled the following C code with and without this option

int main(void)
{
        int i;

        i = myf(1, 2);
}

int myf(int a, int b)
{
        return a + b;
}

,

# gcc -S -fomit-frame-pointer code.c -o withoutfp.s
# gcc -S code.c -o withfp.s

.

diff -u 'ing the two files revealed the following assembly code:


--- withfp.s    2009-12-22 00:03:59.000000000 +0000
+++ withoutfp.s 2009-12-22 00:04:17.000000000 +0000
@@ -7,17 +7,14 @@
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
-       pushl   %ebp
-       movl    %esp, %ebp
        pushl   %ecx
-       subl    $36, %esp
+       subl    $24, %esp
        movl    $2, 4(%esp)
        movl    $1, (%esp)
        call    myf
-       movl    %eax, -8(%ebp)
-       addl    $36, %esp
+       movl    %eax, 20(%esp)
+       addl    $24, %esp
        popl    %ecx
-       popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
@@ -25,11 +22,8 @@
 .globl myf
        .type   myf, @function
 myf:
-       pushl   %ebp
-       movl    %esp, %ebp
-       movl    12(%ebp), %eax
-       addl    8(%ebp), %eax
-       popl    %ebp
+       movl    8(%esp), %eax
+       addl    4(%esp), %eax
        ret
        .size   myf, .-myf
        .ident  "GCC: (GNU) 4.2.1 20070719 

Could someone please shed light on the key points of the above code where -fomit-frame-pointer did actually make the difference?

Edit: objdump's output replaced with gcc -S's

Community
  • 1
  • 1
PetrosB
  • 4,134
  • 5
  • 22
  • 21

4 Answers4

35

-fomit-frame-pointer allows one extra register to be available for general-purpose use. I would assume this is really only a big deal on 32-bit x86, which is a bit starved for registers.*

One would expect to see EBP no longer saved and adjusted on every function call, and probably some additional use of EBP in normal code, and fewer stack operations on occasions where EBP gets used as a general-purpose register.

Your code is far too simple to see any benefit from this sort of optimization-- you're not using enough registers. Also, you haven't turned on the optimizer, which might be necessary to see some of these effects.

* ISA registers, not micro-architecture registers.

Eric Seppanen
  • 5,923
  • 30
  • 24
  • If I have to set explicitly other optimization options, what's the meaning of this option being separate? Your point that my code is simple seems valid though! – PetrosB Dec 21 '09 at 22:01
  • This option is separate because it significant downsides for debugging. – Anon. Dec 21 '09 at 22:04
  • It's separate because it has functional implications for other things, like running your code in a debugger, or linking with other code. I assume you'd see a reduction in register spills even with the optimizer turned off, but since I don't know for sure I'm hedging my bets. – Eric Seppanen Dec 21 '09 at 22:05
  • @EricSeppanen can you elaborate on the difference between ISA and micro-architecture registers? Do you perhaps mean x87/MMX/SSE/etc. with the latter? Or "internal" registers (like eip)? Thanks! – andreee Dec 15 '15 at 17:24
  • I realized that micro-architecture registers most probably refer to registers not seen b the programmer, but rather accessed by register renaming etc. Can anyone confirm that's true? Thanks! – andreee Dec 15 '15 at 23:27
  • 1
    @andreee: Yes, this answer is pretty clearly meaning architectural vs. physical registers (that architectural are renamed onto). Register renaming avoids false dependencies when you reuse the same architectural register for a different value (when you're done with the old value you had there), but it doesn't help you if you need 9 different variables "live" in the same loop. – Peter Cordes Jan 20 '21 at 01:05
11

The only downside of omitting it is that debugging is much more difficult.

The major upside is that there is one extra general purpose register which can make a big difference on performance. Obviously this extra register is used only when needed (probably in your very simple function it isn't); in some functions it makes more difference than in others.

Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
  • 1
    Not only does it make debugging much more dufficult. Gnu docsonline says that it makes debugging impossible – PetrosB Dec 21 '09 at 21:57
  • 18
    They are wrong. `printf()` debugging (which **IS** still debugging) is very possible, for example. – Andreas Bonini Dec 21 '09 at 21:58
  • 11
    You can still debug at the instruction (assembly language) level regardless of any compiler options used. Not as easy as source level debugging to be sure, but "impossible" is definitely the wrong word. – Ben Voigt Dec 21 '09 at 22:10
  • This answer is obsolete (probably even when it was posted). Modern debug info formats like DWARF make it possible for a debugger to find variable values even when they're in registers in optimized code (if they exist at all). For un-optimized code, `gcc -O0 -fomit-frame-pointer -g` should have a negligible effect on debugging on GNU/Linux. Not that you *should* bother to use `-fomit-frame-pointer` in debug builds, but that proves it doesn't make debugging much if any harder. – Peter Cordes Jan 20 '21 at 01:01
  • There are still situations where frame pointers are necessary for a good debugging experience. The Linux kernel doesn't have a DWARF parser (and is unlikely to gain one), so tools that stack walk inside the kernel (e.g. bpftrace's `ustack()`) don't work properly without frame pointers. – nemetroid Nov 07 '22 at 23:58
7

You can often get more meaningful assembly code from GCC by using the -S argument to output the assembly:

$ gcc code.c -S -o withfp.s
$ gcc code.c -S -o withoutfp.s -fomit-frame-pointer
$ diff -u withfp.s withoutfp.s

GCC doesn't care about the address, so we can compare the actual instructions generated directly. For your leaf function, this gives:

 myf:
-       pushl   %ebp
-       movl    %esp, %ebp
-       movl    12(%ebp), %eax
-       addl    8(%ebp), %eax
-       popl    %ebp
+       movl    8(%esp), %eax
+       addl    4(%esp), %eax
    ret

GCC doesn't generate the code to push the frame pointer onto the stack, and this changes the relative address of the arguments passed to the function on the stack.

Commodore Jaeger
  • 32,280
  • 4
  • 54
  • 44
5

Profile your program to see if there is a significant difference.

Next, profile your development process. Is debugging easier or more difficult? Do you spend more time developing or less?

Optimizations without profiling are a waste of time and money.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154