Why do we need stalls even if branches can be determined?

Question

I am learning about pipelining and was reading about control hazards from the book Computer Organization and Design: The Hardware/Software Interface (MIPS Edition). There is a paragraph in the book (Chapter 4.6) that has me puzzled:

Let's assume that we put in enough extra hardware so that we can test registers, calculate the branch address, and update the PC during the second stage of the pipeline (see COD Section 4.9 (Control hazards) for details). Even with this extra hardware, the pipeline involving conditional branches would look like the figure below. The lw instruction, executed if the branch fails, is stalled one extra 200 ps clock cycle before starting.

I don't quite understand what exactly the paragraph is saying here. My initial guess was that it meant even in a hypothetical scenario where you can determine which branch to take and update the program counter within the one-clock cycle afforded before the next instruction must be fetched, we would still need a stall but that doesn't make sense to me because if we know what to do, why not just do it and go about it? So I assume I am clearly missing something but I can't piece it together.

EDIT: Here is the picture referenced in the text

This isn't really about software, you're not going to find too many people familiar with processor design here. Since I took the course you are 20 years ago I'm not going to take a stab at answering it and risk getting it wrong. I'd suggest trying the electrical engineering stack exchange, it's a closer fit. Also, it would help to post the image you're referencing in the quote. Although in general I can say that it's not possible to get branch prediction 100% right, so there is always the possibility of a stall from that. — Gabe Sechan, Apr 23 '23 at 06:07
@GabeSechan Okay will take a look at the Electrical Engineering stack exchange. Also, the image given in the textbook is kind of confusing, the image immediately following the quoted text doesn't have the referenced `lw` instruction at all which makes this even more puzzling, but I will still put the picture for more clarity. — Prithvidiamond, Apr 23 '23 at 06:12
They're saying that knowing taken/not-taken in the 2nd stage is still too late to avoid a bubble for taken branches. We would have to know in the first stage whether a branch is taken or not taken, and where it goes when taken, to avoid a bubble, which is to say while it is being fetched and before it goes into decode (the 2nd stage). Believe it or not, it is possible to do great but not perfect prediction in the first stage by remembering where the instruction at that address branches, but the classic 5-stage risc pipeline doesn't have hardware for this. — Erik Eidt, Apr 23 '23 at 17:05

score 4 · Accepted Answer · answered Apr 23 '23 at 06:33

4

To sustain 1 instruction per clock (1 IPC) with no stalls, the IF stage needs to be fetching literally every cycle. That requires an address to fetch from.

But if it takes an extra cycle after a fetch to compute the next place to fetch from (branch latency of 1 cycle), that's a cycle you can't be fetching (or might not be usefully fetching).

Even unconditional branches need prediction if you want a branch latency of 0 (no stalls). Related:

How does MIPS I handle branching on the previous ALU instruction without stalling? - how real MIPS I kept the branch latency down to 1 cycle, to be hidden with a delay slot, computing branch conditions in the first half of the EX stage with IF only needing half a cycle.
Slow jmp-instruction - more about the fact that even unconditional taken branches like x86 jmp (or MIPS j or b) need prediction if you want to run them at 1/clock.
https://en.wikipedia.org/wiki/Classic_RISC_pipeline#Control_hazards - points out that during the cycle where a branch is decoded, IF will already be fetching the instruction after the branch, assuming no branch prediction. (So not-taken branches won't stall.)

answered Apr 23 '23 at 06:33

Peter Cordes

328,167
45
605
847

If I am not mistaken, the branch target's address gets computed in the ID stage, so you are saying that the ID stage is the problem? – Prithvidiamond Apr 23 '23 at 06:38
1

@Prithvidiamond: The target address for the branch is being computed in the ID stage during the same cycle the instruction after the branch is being fetched. If the branch is taken, that fetch wasn't useful because it wasn't the instruction we need to actually execute next. (Assuming an ISA without a branch-delay slot, like RISC-V, or a fake / simplified MIPS.) Remember that the point of pipelining is to have different stages working on different instructions at the same time, but a control dependency needs to feed a result backward from ID to IF, an earlier pipeline stage. – Peter Cordes Apr 23 '23 at 06:47
1

It's the same problem as using an `lw` result in the next instruction, since forwarding back in time from the end of MEM to the start of EX is also impossible, it takes a cycle. – Peter Cordes Apr 23 '23 at 06:48
1

@Prithvidiamond: Also keep in mind that computing a branch target address in ID wouldn't work well in a CPU with bypass forwarding: `slt $t0, ...` / `bnez $t0, target` would have to stall to wait for `$t0` to be computed by EX before it could forward to ID for the next instruction. See the linked Q&A about how MIPS I actually works. – Peter Cordes Apr 23 '23 at 06:50
Hmm, looks like this is a bit more complicated than expected. I will read up on the stuff you linked and get back to you. I will also accept your answer. – Prithvidiamond Apr 23 '23 at 07:11
2

@Prithvidiamond: The answer to your specific question is quite simple: IF needs the result of ID, a later pipeline stage, for any taken branch. So it has to stall unless branch prediction already steered IF the right awy. The extra complications I mentioned (like computing branch conditions in ID not being realistic) are just reasons why that over-simplified hypothetical case isn't how real CPUs work. They're worth understanding, too, but you don't need to understand them to understand the basics of a control hazard. – Peter Cordes Apr 23 '23 at 07:15

Why do we need stalls even if branches can be determined?

1 Answers1