18

I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.

I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.

P.S : I just want some general view, not too detailed.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1843665
  • 353
  • 2
  • 3
  • 7

5 Answers5

35

There are lots of resources out there that explain this:

to name a few.

Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:

  1. Keeping track of where to return to, when calling a function
  2. Storage of local variables in the context of a function call
  3. Passing arguments from calling function to callee.

The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.

In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)

The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)

Then the prologue is executed:

push  ebp         ; Save the stack-frame base pointer (of the calling function).
mov   ebp, esp    ; Set the stack-frame base pointer to be the current
                  ; location on the stack.
sub   esp, N      ; Grow the stack by N bytes to reserve space for local variables

At this point, we have:

...
ebp + 4:    Return address
ebp + 0:    Calling function's old ebp value
ebp - 4:    (local variables)
...

The epilog:

mov   esp, ebp    ; Put the stack pointer back where it was when this function
                  ; was called.
pop   ebp         ; Restore the calling function's stack frame.
ret               ; Return to the calling function.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • Just for the sake of being complete, its good to mention that ret instruction does the opposite of the call instruction, that is, ret instruction must also do 2 things - pop the return address off of the stack using esp, then jump to that address to resume execution from there. – ZeZNiQ Apr 01 '20 at 00:36
  • 1
    Then it begs the question of who cleans up the passed arguments and when? My guess would be, as per x86 cdecl calling convention, the caller must be the one to push args BEFORE calling the call instruction, and therefore, it must be the same caller who will need to cleanup of the args off of the stack AFTER calling the ret instruction. – ZeZNiQ Apr 01 '20 at 00:44
  • @ZeZNiQ: This shows the function using `ret`, not [`ret 12`](https://www.felixcloutier.com/x86/ret) or whatever, so it's a caller-pops convention like i386 System V, or MSVC cdecl, where code in the caller right after `call foo` finds ESP unmodified by the call. So the caller could `mov` new args into that space for another call, instead of add esp,12 / push. Calling a function that ends with `ret 12` would (from the caller's POV) be like running `add esp,12` after whatever code the function ran. – Peter Cordes Mar 15 '21 at 10:56
4
  1. C Function Call Conventions and the Stack explains well the concept of a call stack

  2. Function prologue briefly explains the assembly code and the hows and whys.

  3. The gen on function perilogues

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
Aniket Inge
  • 25,375
  • 5
  • 50
  • 78
2

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.

Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.

I will quote the below from the accepted answer & try to extend it's explanation.

In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack.

The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack.

The last line in the quote above says immediately after the call, esp points to the return address on the stack.

Why's that?

So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below

enter image description here

So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).

Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.

Problem 1

So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.

There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.

So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:

enter image description here

So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.

Problem 2

So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?

So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:

enter image description here

The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.

With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :

  • first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).

  • Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.

  • Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.

  • It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.

And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.

Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

qre0ct
  • 5,680
  • 10
  • 50
  • 86
0

A picture is worth a thousand words, so here's some diagrams of how the stack changes throughout a function call - and remember in these diagrams, memory addresses grow up, and the stack grows down :)


The caller pushes the arguments and return address onto the stack.

The callee expects to find the arguments (in reverse) and the return address on the stack:

|      ...       | <- End of caller's stack frame
+----------------+
|   Argument n   | <- Start of callee's stack frame
+- - - - - - - - +
+- - - - - - - - +
|   Argument 2   |
+----------------+
|   Argument 1   |
+----------------+
| Return address | <- "Top" of stack (esp)
+----------------+

The callee then pushes its caller's stack frame base pointer (ebp) on the stack, sets ebp to this current stack pointer (esp) value, and then adds space for the local variables before running the function.

|      ...       |
+----------------+
|   Argument n   |
+- - - - - - - - +
+- - - - - - - - +
|   Argument 2   |
+----------------+
|   Argument 1   |
+----------------+
| Return address | <- Before: previous top of stack
+----------------+
| Previous $ebp  |
+----------------+
|  Local var 1   |
+----------------+
|  Local var 2   |
+- - - - - - - - +
+- - - - - - - - +
|  Local var n   | <- After: new top of stack (esp)
+----------------+

The callee, after running its function body, clears the local variables' stack space and pops the top of the stack (the previous ebp value) into ebp, resetting it for the caller's frame.

|      ...       |
+----------------+
|   Argument n   |
+- - - - - - - - +
+- - - - - - - - +
|   Argument 2   |
+----------------+
|   Argument 1   |
+----------------+
| Return address | <- Top of stack (esp)
+----------------+

The callee then pops the value on top of the stack (the return address) into the instruction pointer register (eip), so the next instruction that's executed is back in the caller.

The function has now returned, and the caller can continue execution, expecting the stack to look like this:

|      ...       |
+----------------+
|   Argument n   |
+- - - - - - - - +
+- - - - - - - - +
|   Argument 2   |
+----------------+
|   Argument 1   | <- Top of stack (esp)
+----------------+
dwb
  • 2,136
  • 13
  • 27
  • You're assuming 32-bit x86, with a calling convention that passes all args on the stack, none in registers (like i386 System V, or Windows stdcall or cdecl). The question doesn't mention x86. (Register args don't fundamentally change things, except for variadic functions that want to iterate their args.) On most other ISAs, the return address is passed in a register (the "link register"), and the callee is responsible for storing it if necessary (non-leaf function). Also, using EBP as a frame pointer is optional. – Peter Cordes May 06 '23 at 14:37
  • All that said, yes, this basic picture is a useful starting point for understanding other calling conventions. Some, like Windows x64 with "home space" aka "shadow space" above the return address contiguous with stack args, let us get back to this simple picture if desired. MIPS does something similar (https://devblogs.microsoft.com/oldnewthing/20180419-00/?p=98555). PowerPC has a red zone (below the stack pointer) as part of the calling convention: https://devblogs.microsoft.com/oldnewthing/20190111-00/?p=100685 – Peter Cordes May 06 '23 at 14:43
-5

Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).

Prologue: The structure of Prologue is look like: push ebp mov esp,ebp

Epilogue: The structure of Prologue is look like: leave ret

More in detail : what is Prologue and Epilogue

  • 1
    depends on calling convention – Ivan Kush Sep 10 '16 at 09:24
  • Some functions need to save more registers as part of their prologue, and reserve stack space with `sub esp, N`. If they *don't* do any more than `push ebp` / `mov ebp, esp` (you got that instruction backwards), then most compilers will use `pop ebp` / `ret` as the epilogue, because `pop ebp` is cheaper than `leave`, and does the same thing if ESP is still pointing to the saved EBP. Also, some calling conventions are callee-pops and end with `ret 8` or whatever. – Peter Cordes Mar 15 '21 at 11:01