Arguing whether a situation leads to data hazard or not

Question

I was going through the section of pipelining from the text Computer Organization [5e] by Hamacher et. al.. There I came across a situation which the authors claim causes data hazard.

The situation is shown below:

For example, stage E in the four-stage pipeline of Figure 8.2b is responsible for arithmetic and logic operations, and one clock cycle is assigned for this task. Although this may be sufficient for most operations, some operations, such as divide, may require more time to complete. Figure 8.3 shows an example in which the operation specified in instruction I2 requires three cycles to complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write stage must be told to do nothing, because it has no data to work with. †: Meanwhile, the information in buffer B2 must remain intact until the Execute stage has completed its operation. This means that stage 2 and, in turn, stage 1 are blocked from accepting new instructions because the information in B1 cannot be overwritten. Thus, steps D4 and F5 must be postponed as shown.

... Any condition that causes the pipeline to stall is called a hazard. We have just seen an example of a data hazard. A data hazard is any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline.

In the example above, the authors assume that a data hazard has occurred, and two stall cycles are introduced into the pipeline. The main reason that they give for this data hazard is that, since the execute phase requires 2 more cycles than the usual need for instruction 2, so the data on which the write back stage should work has to wait for 2 cycles...

But I am having a little difficulty in accepting this analysis. Usually, the books give examples of data hazards in situations, where there is data dependency (the usual RAW, WAR, etc..). But here there is no such thing. And I thought this to be a structural hazard assuming that I2 cannot use the EX stage as I1 is using it.

Moreover, the text assumes that there is no queuing of the results of the stages in the buffer. Clear from the statement marked with †, Meanwhile, the information in the buffer..., (where there is a little flaw as well, because, if no queuing is there, then the output of D3 in cycle 4 shall overwrite the value in buffer B2 on which the EX stage is working, a contradiction to their own assumption).

I thought that the stalls are introduced due to this no queuing condition... and structural hazard, and if things are properly managed as shown below, no stalls shall be there.

This is what I assume:

I assume that the execute stage has more than one separate functional units (e.g. one where calculations of instruction 1 are performed. [basic ALU requiring 1 cycle duration], one for integer division, another for integer multiplication etc.) [So structural hazard is out of the way now.]
I also assume that the pipeline buffers can store the results produced in the stages in a queue. [So that the problem in statement marked with † is no longer there.]

This being said, the situation is now as follows:

However hard I tried with the assumptions, I could not remove the bubbles shown in blue. [Even if queuing is assumed in buffers, the buffers cannot give the result out of order, so those stalls remain].

With this exercise of mine, I feel that the example shown in the text is indeed a hazard and that too data hazard (even though there was no data dependencies ?), as in my exercise there was no chance of structural hazard...

Am I correct?

Separate from your main question, In clock cycles 4 and 5, you have two instructions in the E stage at the same time. If something stalls in the E stage, the stall bubbles need to come *before* the E stage in later instructions, like in the Fig 8.3 image you linked from the book. (And yeah, it's weird that they talk about the pipeline register between stages needing to stay constant. If a multi-cycle non-pipelined execution unit needs to keep values around, it could snapshot them. Unless maybe the stall signal makes the Decode stage keep generating that output repeatedly until it can latch.) — Peter Cordes, Nov 12 '21 at 15:10
@PeterCordes, here is what I have assumed (though my knowledge is just based on theoretical concepts present in textbooks or so and I have no hi-fi practical knowledge), There is no stall in the E stage I guess. Each independent (assumption) E stage, produces the output and it is stored in the interstage buffer `B3` in a queue. Now the stalls are introduced in the W stages, in the subsequent instructions, why? because I assume the buffers cannot deliver results out of order. So for example, the stage W for I2 should wait till W for I1 has retrieved the value from the queue in B3... — Abhishek Ghosh, Nov 12 '21 at 15:13
*And I thought this to be a structural hazard assuming that I2 cannot use the EX stage as I1 is using it.* - Yup, that's the terminology I'd use, based on wikipedia: https://en.wikipedia.org/wiki/Hazard_(computer_architecture). Data hazards are only visible when you consider which operands are being used. e.g. an *independent* multiply (result not read by the next few insns) could be allowed to complete out of order, after executing in a separate multi-cycle or pipelined multiplier unit. (Write-back conflicts are a problem, one MIPS solved with the special `hi:lo` reg pair for mult/div) — Peter Cordes, Nov 12 '21 at 15:14

score 0 · Answer 1 · answered Nov 12 '21 at 15:33

And I thought this to be a structural hazard assuming that I2 cannot use the EX stage as I1 is using it.

Yup, that's the terminology I'd use, based on wikipedia: https://en.wikipedia.org/wiki/Hazard_(computer_architecture).

That article restricts data hazards to only RAW, WAR, and WAW. As such, they're only visible when you consider which operands are being used.

e.g. an independent multiply (result not read by the next few insns) could be allowed to complete out of order, after executing in a separate multi-cycle or pipelined multiplier unit.

Write-back conflicts would be a problem if the slow ALU instruction needed to write a GPR in the same cycle as a later add or something. Also data hazards like WAW, since mul r3, r2, r1 / sw r3, (r4) / add r3, r2, r1 should leave r3 = r2+r1 not r2*r1.

MIPS solved all that with the special hi:lo reg pair for mult/div, allowing the mul and div units to be loosely coupled to the 5-stage pipeline. And the ISA had pretty relaxed rules about what was allowed to happen to those registers, e.g. writing one with mthi r3 destroys the previous value of the other, so mflo r2 would give unpredictable results after mthi. Raymond Chen's article.

An "in-order pipeline" means instructions start execution in program order, no necessarily that they complete in program order. It's very common for modern in-order pipelines to allow memory operations to complete out of order, allowing memory-level parallelism and allowing instruction scheduling to hide load-use latency of L1d cache hits. It's also possible to pipeline higher-latency ALU operations as long as hazards are detected and handled somehow.

Do these authors use the term "structural hazard" at all, or do they consider all (non-control?) hazards to be data hazards?

At this point it seems like primarily a terminology issue. IDK if they're on their own in using terminology this way, or if there is another convention with any popularity other than the one Wikipedia describes.

Separate from your main question, In clock cycles 4 and 5, you have two instructions in the E stage at the same time. If something stalls in the E stage, the stall bubbles need to come before the E stage in later instructions, like in the Fig 8.3 image you linked from the book.

And yeah, it's weird that they talk about the pipeline register between stages needing to stay constant. If a multi-cycle non-pipelined execution unit needs to keep values around, it could snapshot them.

Unless maybe the stall signal makes the Decode stage keep generating that output repeatedly until the stall signal is de-asserted and the pipeline register will finally latch the output of the previous stage instead of ignoring it. There are latches / flip-flops that have a control signal separate from the clock that makes them ignore their input and keep outputting what they were already outputting.

Arguing whether a situation leads to data hazard or not

1 Answers1