0

I understand the need for re-order buffer in speculative execution. However, given a sequence of non-speculative instructions without any branches, why is it that all these instructions still have to go through the ROB and then commit in order? Since there is no control hazard and assuming the presence of register renaming to avoid WAR and WAW hazards, is ROB a necessity in such a case?

One reason I could think of, is for handling imprecise exceptions. Is there any other reason?

appusajeev
  • 2,129
  • 3
  • 22
  • 20
  • What do you mean by "non-speculative instructions"? If you have out-of-order execution, everything is speculative until retired. Even if there's no branch - how would you rollback on faults, interrupts, etc..? For the same reason - you need the ROB to reinstate program order at commit. – Leeor Dec 13 '13 at 21:45
  • By non-speculative instructions, i meant instructions that were not fetched speculatively(instructions not fetched by branch prediction, but fetched in normal program order), so that out of order write of result values cause no issues. And yes, exceptions are one reason for which we need an ROB anyway. Was wondering if there was any other reason. – appusajeev Dec 30 '13 at 06:28
  • Related: [Out-of-order execution vs. speculative execution](https://stackoverflow.com/a/49661172) is another explanation of @Leeor's main point that all instructions are treated as speculative until retirement. Hadi's answer on the same question has some more general discussion about speculation without out-of-order execution, or vice versa, on CPUs other than modern OoO designs. – Peter Cordes May 27 '18 at 20:15
  • I suppose, in principle at least, if you had a series of instructions with no branches/jumps, and without any instructions that could fault, the CPU could handle them in a simplified manner in the ROB, e.g., only use one ROB entry for the whole series of instructions. This implies the series of instructions always executed as kind of an atomic unit (e.g., cannot be interrupted). Still, each instruction that has a register destination affects the renaming state, and managing this is part of retirement, so it would probably greatly complicate the renaming part of retirement. – BeeOnRope May 27 '18 at 23:03
  • In fact, the idea of _macro-fusion_ is along these lines: you find two or more consecutive instructions and fuse them in a way that they take only one ROB entry and are treated as a unit. In effect the ROB is "not used" for the original instructions other than the last one in the macro-fused unit. No surprisingly you see this on x86 only in cases where there is a single (or zero) destination register because of the renaming complications above: in particular, for an ALU op followed by a conditional branch. – BeeOnRope May 27 '18 at 23:05
  • In practice, instructions that can fault and branches are extremely common, so the utility of such a system would be relatively limited for most code. – BeeOnRope May 27 '18 at 23:07

1 Answers1

3

In a real out-of-order machine there's no such thing as non-speculative instructions, everything has to go through the reordering buffer because you do not know, at the pipestage of allocation, what is going to be cleared and what gets committed, because any older branch may have not executed yet. At any moment such branch may be resolved as a mispredict, and flush all younger ROB entries.

I guess you could prevent control hazards by stalling allocation on each conditional branch, but that would have horrible performance and eliminate a lot of the benefits in out-of-order execution by turning each branch (whose average frequency is usually expected to be once every 5 instructions) into a serialization point and a stall.

Another "benefit" of having to go through the ROB is register renaming. Without an ordered index, you'll have trouble managing your physical registers to make any sense according to program order. Say you have 3 consecutive instructions as such:

inc rax
add rbx, rax ; assume rbx is the dest
inc rax

Say rbx is ready late, when it's finally ready to execute the add, how would the out-of-order engine know which value of rax to take? you have by now the old value, the +1 and +2 and all of them are ready - an OOO machine should mark the source as the renamed version of rax at the moment the add entered the ROB. By the way, there are other ways to achieve that correctness but they're more complicated and still require ordering queues.

Leeor
  • 19,260
  • 5
  • 56
  • 87
  • @appusajeev, the ROB is a queue, you would usually also need some lookup table to translate logical to physical register names - like the RAT (register alias table) in intels' P6. However, there are some designs that are fundementally different such as dataflow processors that keep distributed queues that remove the necessity of register lookups. They're not really doing OOO though, although some claim to be equivalent in effect. – Leeor Dec 30 '13 at 10:14
  • I mean, by having ROB ID as the source and destination of instructions, one could do away with the use of Physical Register Files right? – appusajeev Dec 30 '13 at 10:22
  • Yes, although it's usually the other way around. Intel and AMD both added physical register files ([link](http://www.realworldtech.com/sandy-bridge/5/)), after already having a ROB-only based OOO. Note that a physical reg file doesn't eliminate the need for a ROB, you still need an ordering queue – Leeor Dec 30 '13 at 10:30
  • Could you share your email address with me, so that i could clear my doubts with you, if that is okay with you. – appusajeev Dec 30 '13 at 10:38
  • 1
    I'd rather answer here at StackOverflow in case others could benefit from that. Feel free to open any other question, just keep them focused since questions that are too broad for this site might be closed as off topic. – Leeor Dec 30 '13 at 11:05