3

https://blog.cloudflare.com/branch-predictor/ contains an excellent analysis of the performance of branches on modern hardware.

One thing that surprised me was the finding that unconditional jumps take up space in the branch target buffer. Why?

Conditional branches require use of the BTB because at the time when the CPU has just decoded the branch instruction and wants to fetch the next one, it does not yet know the value of the condition. But for unconditional jumps, there is no condition to know the value of. There is an offset that would need to be added to the IP where the jump instruction was found, but that is a constant in the instruction; it seems to me that you already have it by the time you have the opcode. What am I missing?

rwallace
  • 31,405
  • 40
  • 123
  • 242
  • CPU has to (pre)fetch instructions from the new IP address even for unconditional jumps - how would it know that address without BTB? This is explained in linked article right below first colorful image (section "Why is branch prediction needed?"). – Arvo May 07 '21 at 09:13
  • 1
    Branch prediction is needed to pipeline the next fetch, before the block of instructions is even decoded. [Slow jmp-instruction](https://stackoverflow.com/q/38811901). Even in a classic 5-stage RISC, one instruction is being decoded while the next one is already being fetched. In superscalar CPUs that fetch code in larger aligned blocks, you still need to predict at least the right block to fetch. – Peter Cordes May 07 '21 at 09:14
  • 2
    @PeterCordes Ah! So the answer is sure you have the offset by the time you have the opcode – but the CPU wants to fetch the next instruction at an earlier time when it does *not* yet have the opcode, so the BTB effectively serves as a cache not only for the jump destination, but for the fact that there is a jump instruction at this location? – rwallace May 07 '21 at 09:17
  • 1
    @rwallace: Yes, exactly. – Peter Cordes May 07 '21 at 09:18
  • 1
    Note that the BTB is useful in predicting the address of indirect jumps. In that case, it is the backend that will effectively compute the target. This would be a much bigger stall than steering the fetching during the (pre)decoding. – Margaret Bloom May 07 '21 at 09:20
  • 2
    @MargaretBloom: Good point, the question title could be more specific: "unconditional direct jumps" to rule out indirect jumps like `ret` and `jmp rax`. And BTW, I found another duplicate, [What branch misprediction does the Branch Target Buffer detect?](https://stackoverflow.com/q/31280817) which mentions "unconditional relative" and isn't as messy / hand-wavy as the slow-jmp answer I wrote a while ago :P Oh, and a more exact duplicate, [Why are Branch Target Buffers needed for non register jump instructions?](https://stackoverflow.com/q/47664947) – Peter Cordes May 07 '21 at 10:07

0 Answers0