I am working on a problem in the topic of The processors. This problem is in the book whose title is "Computer Organization and Design (6th Edition)". The problem is as follows:
Clearly, this problem is about the branch-taken branch predictor, and the requirement for this problem is to draw the execution diagram for each case of "taken" and "not taken" as being commented in the code.
And here is the answer for the question 4.14.1 and 4.14.2:
The problem lies in the answer. As I read about the prediction for branches in MIPS, I thought that, the diagram would be different. For more details:
In the problem 4.14.1, the first "not taken" conditional branch is fine because there is no delay slot (as an assumption), the next instruction fetch occurs only when
EX
's stage of the branch is done. Nevertheless, the next conditional branch (beq r3, r0, label1
) is immediately followed by another conditional branch (beq r2, r0, label2
) which conflicts with the requirements of the problem, I think. Moreover, thesw r1, 0(r2)
instruction also follows the conditional branch. Why there are so many conflicts here? For more details, how the branchbeq r2, r0, label2
could determine immediately that the following instruction isbeq r3, r0, label1
even when it does not run intoEX
's stage which conflicts with the first case thatlw r3, 0(r2)
has to wait until theEX
's of previous instruction is finished.This problem (problem 4.14.2) is a bit different from the previous one. In this problem, delay slots are used which generates the code as in the answer. It is OK for the first conditional branch (
beq r2, r0, label2
) which is "luckily" not taken so thelw r3, 0(r2)
is not discarded and it continues its execution. However, I can not get the idea that the next conditional branch (beq r3, r0, label1
) MUST be fetched in clock cycle 6 instead of 5. Is there anything wrong here?I found that in the case of not using delay slots, the choice of whether the branch is taken or not is made until the
EX
's stage is done. It is different in the case of using delay slots which can be done in parallel to theEX
's stage of the branch instruction. Is it an appropriate observation? If not, can you give me further explanations of the difference between using and not using delay slots?
I also watched some videos and it did not give me the proper explanation. Therefore, I hope you can help me to understand this problem. I am looking forward to your answers. If you do have any missing information about this problem, please inform me.