5

Taking an empty program

//demo.c

int main(void)
{

}

Compiling the program at default optimization.

gcc -S  demo.c -o dasm.asm 

I get the assembly output as

//Removed labels and directive which are not relevant

main:

pushl   %ebp                  // prologue of main
movl    %esp, %ebp            // prologue of main
popl    %ebp                  // epilogue of main
ret

Now Compiling the program at -O2 optimization.

gcc -O2 -S  demo.c -o dasm.asm 

I get the optimized assembly

main:

rep
ret

In my initial search , i found that the optimization flag -fomit-frame-pointer was responsible for removing the prologue and epilogue.

I found more information about the flag , in the gcc compiler manual.But could not understand this reason below , given by the manual , for removing the prologue and epilogue.

Don't keep the frame pointer in a register for functions that don't need one.

Is there any other way , of putting the above reason ?

What is the reason for "rep" instruction , appearing at -02 optimization ?

Why does main function , not require a stack frame initialization ?

If the setting up of the frame pointer , is not done from within the main function , then who does this job ?

Is it done by the OS or is it the functionality of the hardware ?

Barath Ravikumar
  • 5,658
  • 3
  • 23
  • 39
  • 5
    `rep ret` is a `ret` with a prefix that doesn't alter the semantics, it keeps some AMD processors happy (some of them have a penalty for jumping directly to a `ret`). – harold Mar 22 '13 at 10:05
  • Possible duplicate of [Avoiding gcc function prologue overhead?](http://stackoverflow.com/questions/5477673/avoiding-gcc-function-prologue-overhead) – mlt Mar 11 '16 at 21:08

1 Answers1

5

Compilers are getting smart, it knew you didn't need a stack frame pointer stored in a register because whatever you put into your main() function didn't use the stack.

As for rep ret:

Here's the principle. The processor tries to fetch the next few instructions to be executed, so that it can start the process of decoding and executing them. It even does this with jump and return instructions, guessing where the program will head next.

What AMD says here is that, if a ret instruction immediately follows a conditional jump instruction, their predictor cannot figure out where the ret instruction is going. The pre-fetching has to stop until the ret actually executes, and only then will it be able to start looking ahead again.

The "rep ret" trick apparently works around the problem, and lets the predictor do its job. The "rep" has no effect on the instruction.

Source: Some forum, google a sentence to find it.

One thing to note is that just because there is no prologue it doesn't mean there is no stack, you can still push and pop with ease it's just that complex stack manipulation will be difficult.

Functions that don't have prologue/epilogue are usually dubbed naked. Hackers like to use them a lot because they don't contaminate the stack when you jmp to them, I must confess I know of no other use to them outside optimization. In Visual Studio it's done via:

__declspec(naked)
  • Now i tried , declaring a variable inside main , and wrote some statements which manipulate it and used printf , now i get a prologue , but no epilogue(no pop instruction) – Barath Ravikumar Mar 22 '13 at 05:49
  • I would hazard a guess that the main() function simply doesn't need one because its the end of the road. Try creating a function and do printf() there. –  Mar 22 '13 at 05:52
  • did that , and still unable to find a single popl instruction , in the asm dump – Barath Ravikumar Mar 22 '13 at 05:57
  • That's odd because without it the stack will be screwed, could be that compilers are even smarter than I thought. Try adding an additional printf() and a variable (if it's not pure C) after calling the function from main() - should force the function to clean stack because further use of it is required. –  Mar 22 '13 at 05:59
  • i don't get a popl , but i get a leave instruction , which i found out was nothing but an epilogue , +1 for your answer – Barath Ravikumar Mar 22 '13 at 06:07
  • Indeed, forgot that compilers prefer to use 'higher' asm, too used to disassemblers. Look here for what leave is: http://stackoverflow.com/questions/5474355/about-leave-in-x86-assembly –  Mar 22 '13 at 06:10
  • 1
    just cause you are using the stack doesn't mean you need a stack frame, these days you'll only need a frame if you are using VLA's or `_alloca`. also, it should be noted that the proper mnemonic in this case is `PAUSE`, `REP` is technically a prefix... – Necrolis Mar 22 '13 at 06:47
  • 4
    `pause` is `rep nop`, `rep ret` is something different. – harold Mar 22 '13 at 10:06
  • @harold That's right, thanks for pointing that out. Fixed now. –  Mar 22 '13 at 14:15
  • 1
    @BarathBushan The stack adjustment is done by adding/subtracting from the stack pointer, not by moving data to/from memory (that is _expensive_, and to pop data just to discard it makes no sense). – vonbrand Mar 25 '13 at 02:23
  • @vonbrand But the pop instruction is part of the epilogue, because the prologue pushed a register to save it, the epilogue must pop it to restore it. Unless of course nothing follows that function in which case even increasing the esp would not be required. –  Mar 25 '13 at 03:09
  • @dingrite The effect can also be gotten by moving data explicitly, no pop instruction. Or it is part of the job of the return instruction. – vonbrand Mar 25 '13 at 03:12