We can use both NOPs
, data forwarding and stall cycles to resolve data and load-use hazards. However if we have multiple data hazards, then it becomes quite inefficient to resolve all of them using NOPs
, as they would increase the runtime of the program. In comparison to that, if we have a load use hazard, we can use data forwarding and stall cycles to resolve the hazard and it gives a more efficient result. My question is, how is data forwarding in combination with stall cycles a more efficient way of dealing with data hazards compared to NOPs
? Because when we add a stall cycle then the program has to wait a clock cycle to allow for the data forwarding (MEM
to EX
). Thus the clock cycle count will be increased by 1.
Why are data forwarding and stall cycles more efficient than NOPs for dealing with load-use hazards?
1 Answers
Data forwarding overcomes some hazards, with the recognition that the necessary value computed by a prior instruction is available sooner than when it appears back in the register. So data forwarding is always a win over stalling and NOPs.
Of course, stalling is sometimes necessary, as in the case you describe with a load-use hazard. In the small, stalling has the same effect as NOPs, however:
Code size is smaller without the NOPs. Code size has a huge effect in the instruction cache -- this affects performance and thus code size cannot be ignored.
Also, from a perspective of architecture longevity, while we may know the number of NOPs needed for some micro-architecture design, this will most likely change in future micro-architectures, so the NOPs inserted in an older program are no longer doing their job properly on the newer hardware. Thus, we conclude that is better to let the hardware stall rather than inserting NOPs.
For example, an out-of-order machine may internally rearrange instructions to cover a MEM->EX hazard (NOPs would just get in the way).

- 23,049
- 2
- 29
- 53
-
1Another advantage of stalling is that operations past the stall in the pipeline are not delayed. If a downstream stall occurs during an upstream stall, the effects of the stalls may overlap, reducing the total delay imposed thereby. – supercat Feb 13 '20 at 18:27
-
Data forwarding may require waiting for the result, which I think what the OP meant by "stall cycles." One could avoid waiting for the actual value to be obtained using value prediction. I think by "NOPs" the OP was referring to instruction scheduling in software, which is one of the properties of the VLIW architecture. One advantage of software scheduling is simpler hardware. Or the maybe the OP was literally talking about inserting actual NOPs in the instruction stream to deal with hazards? That would not be a good design though. – Hadi Brais Feb 13 '20 at 20:11
-
1@HadiBrais: It's tagged MIPS, it's pretty clearly asking about hiding hazards with software NOPs (like classic MIPS's load delay slot) instead of having hardware interlocks to detect hazards. And/or it's confusing stall cycles with actual software NOPs, like in this ambiguous question about a MIPS without bypass forwarding. [How can I rearrange MIPS code to minimise the number of NOPs needed, by hand?](//stackoverflow.com/q/60129312). And yes, of course real NOPs would be terrible, and no commercial CPU would do that; the point is to understand *why* real designs use forwarding and interlocks. – Peter Cordes Feb 14 '20 at 03:36
-
i.e. exactly how bad the alternatives are. – Peter Cordes Feb 14 '20 at 03:37