6

Branch penalty in pipeline results from non-zero distance between ALU and IF.

What does it mean by this statement?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user366312
  • 16,949
  • 65
  • 235
  • 452
  • 1
    Surprisingly, [Branch predictor - Wikipedia](https://en.wikipedia.org/wiki/Branch_predictor) does a very good job explaining branch prediction (the associated models) and the misprediction penalty in great detail. Always a top contender for the first stop. – David C. Rankin Jun 02 '19 at 06:31

3 Answers3

7

Without (correct) branch prediction, fetch doesn't know what to fetch next until the ALU decides which way a conditional or indirect branch goes. So it stalls until the branch executes in the ALU.

Or with an incorrect prediction, the fetched/decoded instruction from the wrong path are useless, so we call it the branch mispredict penalty; branch prediction hides it in the normal case.

Another term for this is "branch latency" - the number of cycles from fetching a branch instruction until the front-end fetches a useful next instruction.

Note that even unconditional branches have branch latency: the fact that an instruction is a branch at all isn't known until after it's decoded. This is earlier in the pipeline than execution so the possible penalty is smaller than for conditional or indirect branches.


For example, in first-gen MIPS R2000, a classic 5-stage RISC, conditional branches only take half a cycle in the EX stage, and IF doesn't need the address until the 2nd half of a clock cycle, so the total branch latency is kept down to 1 cycle. MIPS hides that latency with a branch-delay slot: the instruction after a branch always executes, whether the branch it taken or not. (Including unconditional direct branches; the ID stage can produce the target address on its own.) Later more deeply pipelined MIPS CPUs (especially superscalar and/or out-of-order) did need branch prediction, with the delay slot not able to fully hide branch latency.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
3

It means, you had penalty between the cycles of the processor. Every processor has cycles of operation, each delay in the cycle will result in a penalty, as it waits until the branch executes in the ALU or:

Branch penalty in pipeline results from non-zero distance between ALU and IF.

There is a wonderful, but long book called Computer Architecture Piplined And Parallel Processor Design .

It explains in detail regarding the issue.

Barr J
  • 10,636
  • 1
  • 28
  • 46
  • You mean stages, not cycles. A "cycle" is the unit of time, a clock cycle. During which instructions move to the next pipeline stage (if they're not stalled) – Peter Cordes Jun 02 '19 at 06:38
1

Short Answer:

The penalty for mis-predicting the next possible branch would result in time wastage (CPU clock cycles) as

  1. the wrongly predicted branch which was fetched and executed speculatively would then need to be discarded and
  2. the actual next branch would need to fetched and executed reactively anyway.

Long Answer: Look up: "Instruction pipelining", "Branch prediction" , "Loop unrolling", ...

SolZ
  • 11
  • 1
  • You're using the word "branch" to describe non-branch instructions on a (possible) *path* of execution. This is confusing. Instead say "the wrongly predicted path was fetched and needs to be discarded". For example, Intel's description of the perf-counter event `int_misc.clear_resteer_cycles` on Skylake is "*[Cycles the issue-stage is waiting for front-end to fetch from resteered **path** following branch misprediction or machine clear events]*". (Resteer = point the fetch stage at the correct path.) Run `perf list` on a Linux machine with a similar CPU to see that. – Peter Cordes Oct 08 '19 at 19:09