15

I wanted to take a look at the assembly output for a tiny Rust function:

pub fn double(n: u8) -> u8 {
    n + n
}

I used the Godbolt Compiler Explorer to generate and view the assembly (with the -O flag, of course). It shows this output:

example::double:
    push    rbp
    mov     rbp, rsp
    add     dil, dil
    mov     eax, edi
    pop     rbp
    ret

Now I'm a bit confused, as there are a few instructions that doesn't seem to do anything useful: push rbp, mov rbp, rsp and pop rbp. From what I understand, I would think that executing these three instructions alone doesn't have any side effects. So why doesn't the Rust optimizer remove those useless instructions?


For comparisons, I also tested a C++ version:

unsigned char doubleN(unsigned char n) {
    return n + n;
}

Assembly output (with -O flag):

doubleN(unsigned char): # @doubleN(unsigned char)
    add dil, dil
    mov eax, edi
    ret

And in fact, here those "useless" instructions from above are missing, as I would expect from an optimized output.

Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
  • 2
    A question asking something similar was [recently asked](https://stackoverflow.com/questions/45562164/why-does-this-code-generate-much-more-assembly-than-equivalent-c-clang/45562380), but it was closed for good reasons. So this question attempts to be focused and serve as a canonical question for that topic; as suggested by others. – Lukas Kalbertodt Aug 09 '17 at 09:06
  • 1
    Try compiling with `-fomit-frame-pointer` – fuz Aug 09 '17 at 09:38
  • Too bad LLVM is silly here. `mov eax, edi` / `add al,al` would avoid a partial-register stall on Intel Core2/Nehalem (and it emits that even with `-C ar=core2`, which apparently is the rust equivalent of [`clang -march=core2`](https://mail.mozilla.org/pipermail/rust-dev/2014-March/009148.html). And if Rust uses the same x86-64 System V ABI as C, narrow return values are allowed to leave garbage in the upper bytes, so `lea eax, [rdi+rdi]` would work. – Peter Cordes Aug 09 '17 at 11:58
  • Just tried the C++ equivalent, and gcc does what I was suggesting, but clang is still being silly: https://godbolt.org/g/jdGSBp. (clang+llvm unsurprisingly makes the exact same asm as rust+llvm) – Peter Cordes Aug 09 '17 at 12:06

1 Answers1

25

The short answer: Godbolt adds a -C debuginfo=1 flag which forces the optimizer to keep all instructions managing the frame pointer. Rust removes those instructions too when compiling with optimization and without debug information.


What are those instructions doing?

These three instructions are part of the function prologue and epilogue. In particular, here they manage the so called frame pointer or base pointer (rbp on x86_64). Note: don't confuse the base pointer with the stack pointer (rsp on x86_64)! The base pointer always points inside the current stack frame:

                          ┌──────────────────────┐                         
                          │  function arguments  │                      
                          │         ...          │   
                          ├──────────────────────┤   
                          │    return address    │   
                          ├──────────────────────┤   
              [rbp] ──>   │       last rbp       │   
                          ├──────────────────────┤   
                          │   local variables    │   
                          │         ...          │   
                          └──────────────────────┘    

The interesting thing about the base pointer is that it points to a piece of memory in the stack which stores the last value of the rbp. This means that we can easily find out the base pointer of the previous stack frame (the one from the function that called "us").

Even better: all base pointers form something similar to a linked list! We can easily follow all last rbps to walk up the stack. This means that at each point during program execution, we know exactly what functions called what other functions such that we end up "here".

Let's review the instructions again:

; We store the "old" rbp on the stack
push    rbp

; We update rbp to hold the new value
mov     rbp, rsp

; We undo what we've done: we remove the old rbp
; from the stack and store it in the rbp register
pop     rbp

What are those instructions good for?

The base pointer and its "linked list" property are hugely important for debugging and analyzing program behavior in general (e.g. profiling). Without the base pointer, it's way more difficult to generate a stack trace and to locate the function that is currently executed.

Additionally, managing the frame pointer usually doesn't slow things down by a lot.

Why aren't they removed by the optimizer and how can I enforce it?

They usually would be, if Godbolt didn't pass -C debuginfo=1 to the compiler. This instructs the compiler to keep all things related to frame pointer handling, because we need it for debugging. Note that frame pointers are not inherently required for debugging -- other kinds of debug info usually suffice. Frame pointers are kept when storing any kind of debug info because there are still a few minor issues related to removing frame pointers in Rust programs. This is being discussed in this GitHub tracking issue.

You can "undo" it by just adding the flag -C debuginfo=0 yourself. This results in exactly the same output as the C++ version:

example::double:
    add     dil, dil
    mov     eax, edi
    ret

You can also test it locally by executing:

$ rustc -O --crate-type=lib --emit asm -C "llvm-args=-x86-asm-syntax=intel" example.rs

Compiling with optimizations (-O) automatically removes the rbp handling if you don't explicitly turn on debug information.

Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
  • 1
    Does rust have its own debug-info format or something? `clang` and `clang++` enable `-fomit-frame-pointer` by default, and can do so even with `-g` (debugging) enabled. You can still back-trace in the debugger because the ABI requires stack-unwind info in a separate section of the ELF executable. (Exceptions use it too, so even stripped binaries have unwind info in the `.eh_frame` section) – Peter Cordes Aug 09 '17 at 12:01
  • @PeterCordes: Rust uses the platform's debug format, DWARF on unices, PDB on Windows, etc... – Matthieu M. Aug 09 '17 at 12:04
  • 1
    @PeterCordes Yes, it shouldn't be necessary to keep frame pointers. [This](https://github.com/rust-lang/rust/issues/11906) is the most up-to-date tracking issue regarding this. But I guess the priority is fairly low, as it is a performance-problem and it only occurs in debug mode (which is already super slow most of the time). – Lukas Kalbertodt Aug 09 '17 at 12:43
  • 2
    Ok, that makes sense. Just a not-yet-implemented compiler feature, not a design limitation. – Peter Cordes Aug 09 '17 at 12:46
  • @LukasKalbertodt: It may be interesting to link the PR in the answer, just in case comments get cleaned-up. – Matthieu M. Aug 09 '17 at 13:52
  • @MatthieuM. Good idea. I added it! – Lukas Kalbertodt Aug 09 '17 at 14:11
  • 2
    While this was accurate for earlier Rust versions, omission of the frame pointer in newer versions of `rustc` no longer seems to be dependent on `debuginfo` since 1.27: https://godbolt.org/z/j3qcvK9q4 – athre0z Sep 04 '21 at 08:28
  • Related issue: https://github.com/rust-lang/rust/issues/48785 – athre0z Sep 04 '21 at 08:33