0

This is an example of a data hazard :

MOV AL , 25 
MOV BL , 10 
ADD CL , AL , BL
LOAD DL , CL

In this assembler code, we can see that the LOAD DL instruction, CL can only be executed if the ADD CL, AL, BL instruction is executed. Consequently, the two instructions have a dependency on the CL register. To deal with the problem of data hazards, we can reduce also the number of instructions per cycle. This will avoid dependencies between registers.

This is an example of a conflict hazard (branch hazards):

MOV AL , 25  
MOV BL , 10 
CALL branchement
LOAD DL , BL
ADD DL , AL

In this assembler code, the two instructions LOAD AD, BL and ADD DL, AL are already in the pipeline even before the execution of the instruction CALL. In this case, the conditions are not satisfied to make the connection, which causes a control hazard. To prevent this type of conflict from recurring, it is recommended to empty the pipeline before loading new instructions

I would like to see an example of both hazards (control hazard + data hazard) in the same code

EDIT :

Mixing the 2 gives me this :

MOV AL , 25 ; 
MOV BL , 10 ; 
CALL branchement
ADD DL , AL
LOAD DL , BL
Chadi N
  • 439
  • 3
  • 13
  • 2
    You've tagged this question as [tag:x86] but the assembly code you show is not x86 (which does not have a three operand add instruction nor does it have a `LOAD` instruction). Which architecture (or abstract architecture) are you asking about? – 1201ProgramAlarm Feb 29 '20 at 18:22
  • @1201ProgramAlarm we're using "processor 8086". It's my first (and only) class of assembly in school. I dont know much about all the specifications... teacher only said everything is based on intel processor 8086... – Chadi N Feb 29 '20 at 18:43
  • 1
    Actual 8086 isn't pipelined at all (except for prefetch); it's microcoded. x86 instructions can be hard to pipeline (especially memory-destination); it wasn't until 486 / Pentium that it was really done, and then complex instructions would stall the in-order pipeline (basically a hazard within one instruction, like `add [edx], eax` or `pop eax`). It wasn't until Pentium Pro (P6 microarchitecture) that even instructions like that could be handled efficiently (by decoding to 1 or more uops and handling those via out-of-order exec). See Agner Fog's microarch guide https://agner.org/optimize/ – Peter Cordes Feb 29 '20 at 21:16
  • @PeterCordes I removed the x86 tag since it's confusing. I just need help in the format above. How to mash these 2 cods to give me an example of data hazard and conflict hazard at the same time? And thank you for your explications. – Chadi N Feb 29 '20 at 21:27

1 Answers1

1

Actual 8086 isn't pipelined at all (except for prefetch); it's microcoded. It finishes write-back of one instruction before starting decode of the next; the only hazard effect is discarding the prefetch buffer after branches.

x86 instructions can be hard to pipeline (especially memory-destination); it wasn't until 486 / Pentium that it was really done, and then complex instructions would stall the in-order pipeline (basically a hazard within one instruction, like add [edx], eax or pop eax). It wasn't until Pentium Pro (P6 microarchitecture) that even instructions like that could be handled efficiently (by decoding to 1 or more uops and handling those via out-of-order exec). See Agner Fog's microarch guide https://agner.org/optimize/

(Real P6-family and other out-of-order exec x86 microarchitectures hide WAW and WAR hazards by register renaming. See Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators) and Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs.)

The code you've shown isn't strictly x86; there's no LOAD mnemonic; x86's pure load instruction is called mov. Also, LOAD DL, BL makes no sense; neither operand can address memory; they're only 8-bit registers. If you meant copy between registers, that's also mov dl, bl.


I would like to see an example of both hazards (control hazard + data hazard) in the same code

A simple example would be an indirect branch (control hazard) whose target was recently written (true RAW data dependency).

e.g. if we assume 16-bit mode (since you mentioned 8086):

   push  offset target     ; modifies SP (the stack pointer), then stores to memory at SS:SP
   ret                     ; ordinary near return = pop ip

target:
   push  123

ret has 2 inputs:

  • the SP register (just written by pop: RAW hazard)
  • the memory pointed to by SP (also just written by pop, also a RAW memory hazard).

RET writes SP (WAR hazard, although RET itself was the last reader). Also WAW if we consider that push and ret both write SP.

RET does an indirect jump (basically pop ip) using the address loaded from memory (control hazard for the pipeline, if any). All current CPUs will mispredict that ret because they have a special call/ret predictor stack that assumes ret will jump to the return-address of a matching call, like normal code uses. (http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/)

The push 123 at the ret target address

  • reads and writes SP (RAW and WAR hazards)
  • writes to memory that the previous push wrote (WAW memory hazard), and which RET just read (WAR memory hazard).

I put a push after the ret in case you want to look at just the ret/push pair, with the push in the "shadow" of a possibly mispredicted branch.


Of course a store buffer with store forwarding hides / handles the memory data hazards, effectively renaming the memory / cache location. (x86's memory ordering model is basically program-order + a store buffer with store-forwarding: cores are allowed to reload their own stores before they become globally visible.)

Modern x86 CPUs handle the RAW data dependency chain through the stack pointer with a "stack engine" that can keep track of multiple offsets to the stack pointer per clock cycle. (And equally importantly, removes the need for an extra uop to actually do the addition to E/RSP in the back-end, so push / pop can be single uop.) So it's effectively an alternate zero-latency mechanism for execution the stack-pointer modification parts of stack instructions. Using E/RSP directly (e.g. mov bp, sp) lead to a stack-sync uop (on Intel CPUs) that zero the saved offset and apply it to the back-end's value. If the offset was non-zero.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Peter, thank your for your long and detailed answer. The code in my question, what "kind" is it? because the code in your answer doesnt look like mine at all.. I really need the "MOV", "ADD", etc.. – Chadi N Feb 29 '20 at 22:00
  • @SandraGevio I was answering the bolded *I would like to see an example of both hazards (control hazard + data hazard) in the same code* final line of your question. I thought you wanted an alternative to the part you did show, which seemed pretty straightforward. I didn't realize you were *also* asking something else. – Peter Cordes Feb 29 '20 at 23:25
  • Sorry Peter if I wasnt clear, I wanted to see a Code example of both hazards! – Chadi N Feb 29 '20 at 23:28
  • @SandraGevio: My example *does* contain both control and data hazards, like you asked for. `ret` is a control-transfer instruction, basically an indirect branch. – Peter Cordes Feb 29 '20 at 23:30
  • I updated my question and provided a potential answer, what do you think about it? I think your answer is too advanced and my teacher would probably ask himself where i got this from ha!.. – Chadi N Mar 01 '20 at 01:21
  • @SandraGevio: I wasn't trying to write an answer to your homework, I was trying to answer the question asked because it was mildly interesting. Picking CISC instructions like push and ret also nicely illustrates why x86 is hard to pipeline, and the amount of complexity that was required to build CPUs that can execute x86 machine code efficiently. Removing the x86 tag from the question party invalidated the answer. If you need help with the basics of how hazards work, ask your teacher. (https://en.wikipedia.org/wiki/Hazard_(computer_architecture) is pretty good, though.) – Peter Cordes Mar 01 '20 at 01:32