0

enter image description here

The image above is my solution to the optimal pipeline schedule associated to the sequence of five instructions on the left hand side.

There is a single stall before the branch instruction is fetched, which I've inserted because the branch comparison is performed during the ID stage and hence the branch instruction needs the correct value of $s2 before then. With the stall, the add instruction's WB is aligned with the branch instruction's ID, meaning the branch instruction will correctly read $s2 from the register file in the second half of clock cycle 7.

With this said, a colleague is claiming that the stall is unnecessary and that $s2 can be forwarded directly from the add instruction's EX stage to the branch instruction's ID stage. My confusion with this claim is that the branch instruction's dependency due to $s2 would not be detected until midway through its ID stage, by which time the desired value of $s2 has already shifted to the ME/WB buffer, rendering a forward from EX impossible.

Which solution is correct and why?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
gf.c
  • 121
  • 5
  • The pipeline doesn't know that it needs to stall until an instruction reaches ID. The IF stage is never empty, it either holds a newly-fetched instruciton, or one that can't move on yet because of later stalls. But yes, a 2nd-gen MIPS II or later (with pipeline interlocks instead of a load delay slot) will detect the load hazard when decoding an instruction that tries to read the result of a load. A classic MIPS I won't stall, and instead will use a stale value for that add which illegally reads a register before it's guaranteed ready, unless the load misses in cache. – Peter Cordes Apr 02 '20 at 17:46
  • I think you are referring to the wrong forward @PeterCordes. I'm talking about the ```beq``` instruction's dependency on ```$s2```. – gf.c Apr 02 '20 at 17:48
  • No I just hadn't finished commenting. You made minor a mistake in the first one so I thought it would be useful to point that out. Also it's not clear what MIPS microarchitecture you're asking about; it can't be MIPS I R2000 because it doesn't have load interlocks. re: your question: turns out real MIPS *doesn't* evaluate branch conditions in ID. See the linked duplicate. Unless you have a MIPS which you're definitely 100% sure *does* evaluate branch conditions in ID, in which case yes it would have to stall I think. See also Martin Rosenau's answer on my linked questions for some guesses. – Peter Cordes Apr 02 '20 at 17:52
  • Looks like he never fully answered your question though. @PeterCordes. Assuming the branch comparison is done in ID, can we agree that a stall is necessary? – gf.c Apr 02 '20 at 17:55
  • Yes, I'd agree with that. I never found Martin's answer fully convincing, unless we require that EX has its output ready long before the end of a clock cycle, so EX + ID max gate delays fit within one clock cycle for the part of ID that reads a possibly-forwarded EX output. – Peter Cordes Apr 02 '20 at 18:01
  • Thank you! Where can I find more detailed information about how MIPS implements these bypasses? Or is this kind of detail not made public for commercial reasons? @PeterCordes – gf.c Apr 02 '20 at 18:03
  • I don't know that kind of gate-level implementation detail. If you want that, consider diving into RISC-V which has open-source implementations (in verilog or vhdl I assume). I never worried about that level of detail; I know that you can implement a 2:1 muxer for one signal line with a few transistors, so replicating that for all 32 bits of a word is just a matter of scale. And a comparator to decide whether to use this cycle's EX result or a register-fetch result is also simple enough that I don't need to see the gates, if you keep track of which register number the last insn wrote. – Peter Cordes Apr 02 '20 at 18:08

0 Answers0