Return address prediction stack buffer vs stack-stored return address?

Question

Have been reading Agner Fog's "The microarchitecture of Intel, AMD and VIA CPUs" and on page 34 he describes "return address prediction":

http://www.agner.org/optimize/microarchitecture.pdf

3.15 Returns (all processors except P1)

A better method is used for returns. A Last-In-First-Out buffer, called the return stack buffer,remembers the return address every time a call instruction is executed, and it uses this for predicting where the corresponding return will go. This mechanism makes sure that return instructions are correctly predicted when the same subroutine is called from several different locations.

I am a little unclear what the need for this is, given that the return addresses are stored on the stack anyway?

So what is the purpose of storing return addresses on the stack if there is also this technique? Is the stack-stored value only used if this prediction technique doesnt work?

You cannot assume that the processor can *predict* exactly where in the stack the return address is stored. The ESP register is very often restored just before a return as part of the epilogue of a function. — Hans Passant, Mar 16 '14 at 20:47
@HansPassant ah so we're trying to predict the return address, say 15 CPU cycles before the ret instruction is due to be called because 15 CPU cycles before its called we have no idea what could happen to ESP? — user997112, Mar 16 '14 at 20:50

score 15 · Accepted Answer · answered Mar 17 '14 at 00:05

15

Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn't even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / target of the branch faster. A return instruction is a branch, and so it would normally have a branch predictor entry to determine whether it is taken and where the target is. The return address predictor is consulted in lieu of the normal branch target buffer.

So perhaps 50 instructions before the return statement is actually "executed", the fetch stage predicts a return instruction and the instruction address to fetch next. Later, when the return is executed, the address is read from the stack and compared with where the return was predicted to go. If they are the same, execution continues, else execution is rolled back to use the proper return address.

Why store on the stack? First, the processor does not know if the predictor has worked without comparing against the address stored on the stack. Second, the stack is the "official" return address, which might be changed for legitimate reasons. Third, the return address predictor has a limited number of entries. The stack is needed for the return instructions for which there was not room to store the addresses in the predictor.

answered Mar 17 '14 at 00:05

Brian

2,693
5
26
27

How does the return address buffer get a return address pushed on to it during the fetch stage, when this only happens for a CALL instruction and therefore we'd already be in the decoding stage to know we have a CALL instruction? – intrigued_66 Jan 21 '15 at 16:13
I would expect that the return address buffer is updated (pushing an address) at a similar time as the other branch predictor structures, which would have to occur after the instruction has been executed. Remember this is all in hardware, so all stages are executing simultaneously. Furthermore, some hardware structures may be accessed / modified by multiple stages, such as the register file being read from / written to. The branch predictor is similar in that it is not part of the *fetch stage*, but rather primarily accessed by fetch. – Brian Jan 21 '15 at 21:03
Consider the MIPS 5-stage pipeline, RAS will be used only when we know that the current instruction is a return instruction, that will be known in ID/RF stage, and the return address can be taken from the RF in the same stage too, so in MIPS 5-stage pipeline RAS is useless, until we find a way to know that an instruction is return instruction in the IF stage. Am I correct ? – Nishant Feb 18 '15 at 08:40
@Nishant, yes, except part of the design behind a branch predictor is to predict whether an instruction is a branch, return instructions included. Thus the branch predictor solves your problem: "until we find a way to know that an instruction is return instruction in the IF stage". Is saving the one cycle delay, in your example, worth it? It depends. But it is possible. – Brian Feb 18 '15 at 15:12

score 3 · Answer 2 · answered Mar 20 '14 at 16:21

On top of Brians' great explanation, there's the fact that the stack resides in memory. You don't want to have to go to the memory unit and do a memory lookup (not to mention address translation into physical), from some stack address everytime you want to predict the outcome of a branch. The branch prediction wants to be self-sufficient. You can also view the RSB as another form of caching data.

Return address prediction stack buffer vs stack-stored return address?

2 Answers2

Linked