5

If I compile an empty C function

void nothing(void)
{
}

using gcc -O2 -S (and clang) on MacOS, it generates:

_nothing:
    pushq   %rbp
    movq    %rsp, %rbp
    popq    %rbp
    ret

Why does gcc not remove everything but the ret? It seems like an easy optimisation to make unless it really does something (seems not to, to me). This pattern (push/move at the beginning, pop at the end) is also visible in other non-empty functions where rbp is otherwise unused.

On Linux using a more recent gcc (4.4.5) I see just

nothing:
    rep
    ret

Why the rep ? The rep is absent in non-empty functions.

William Morris
  • 3,554
  • 2
  • 23
  • 24
  • I don't have 4.5.5 but I'll look at installing it. But clang gives the same redundant codes, so I thought there might be a reason... – William Morris Aug 05 '13 at 00:03
  • 1
    _"Why the `rep`?"_. The reason [is explained here](http://repzret.org/p/repzret/). – Michael Aug 05 '13 at 10:14
  • What if you explicitly specify `-fomit-frame-pointer` when compiling on OSX? – Michael Aug 05 '13 at 10:28
  • @Michael you are right, thanks. It is gone with `-fomit-frame-pointer`. And the `rep ret` link is good too. If you make these into an answer I can accept them :-) – William Morris Aug 05 '13 at 12:07
  • The better question is "Why does the compiler bother to generate any code for the function at all?". – Brian Knoblauch Aug 05 '13 at 14:09
  • @BrianKnoblauch: What if it didn't and someone called it? – CB Bailey Aug 05 '13 at 14:34
  • If there aren't any references and it's non-library code, nobody can call it without your knowing. Even in that case, you can optimize out the call. Does require knowing the whole stack though, so GTG with compile to binary, but no so much with ones that require a linker step. – Brian Knoblauch Aug 05 '13 at 14:37
  • Related: [What does \`rep ret\` mean?](https://stackoverflow.com/q/20526361) / [repz ret: why all the hassle?](https://stackoverflow.com/q/39863255) – Peter Cordes May 19 '23 at 09:12

4 Answers4

4

Why the rep ?

The reasons are explained in this blog post. In short, jumping directly to a single-byte ret instruction would mess up the branch prediction on some AMD processors. And rather than adding a nop before the ret, a meaningless prefix byte was added to save instruction decoding bandwidth.

The rep is absent in non-empty functions.

To quote from the blog post I linked to: "[rep ret] is preferred to the simple ret either when it is the target of any kind of branch, conditional (jne/je/...) or unconditional (jmp/call/...)".
In the case of an empty function, the ret would have been the direct target of a call. In a non-empty function, it wouldn't be.

Why does gcc not remove everything but the ret?

It's possible that some compilers won't omit frame pointer code even if you've specified -O2. At least with gcc, you can explicitly tell the compiler to omit them by using the -fomit-frame-pointer option.

Michael
  • 57,169
  • 9
  • 80
  • 125
3

As explained here: http://support.amd.com/us/Processor_TechDocs/25112.PDF, a two-byte near-return instruction (i.e. rep ret) is used because a single-byte return can me mispredicted on some on some amd64 processors in some situations such as this one.

If you fiddle around with the processor targeted by gcc you may find that you can get it to generate a single-byte ret. -mtune=nocona worked for me.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • You probably don't actually want to tune for Pentium 4 (`nocona`), even in 2013! Appropriate settings at the time included `-mtune=haswell` or `-mtune=intel`. Or `-mtune=native` for whatever CPU you have. Fortunately not an issue anymore; CPUs that benefited from `rep ret` are so old now that GCC's `tune=generic` hasn't cared about them for several years. – Peter Cordes May 19 '23 at 09:10
1

I suspect early, your last code is a bug. As johnfound says. The first code is because all C Compiler must always follow _cdecl calling convention that in function means (In Intel, sorry, I don't know the AT&T Syntax):

Function Definition

_functionA:
push   rbp
mov    rbp, rsp
;Some function
pop    rbp
ret

In caller :

call _functionA
sub esp, 0 ; Maybe if it zero, some compiler can strip it

Why GCC is always follow _cdecl calling convention when not following that is nonsense, that is the compiler isn't smarter that the advanced assembly programmer. So, it always follow _cdecl at all cost.

-4

That is, because even so called "optimization compilers" are too dumb to generate always good machine code.

They can't generate better code than their creators made them to generate.

As long as an empty function is nonsense, they probably simply didn't bother to optimize it or even to detect this very special case.

Although, single "rep" prefix is probably a bug. It does nothing when used without string instruction, but anyway, in some newer CPU it theoretically can cause an exception. (and IMHO should)

johnfound
  • 6,857
  • 4
  • 31
  • 60