branch delay: What if the branch changes the value that is used by the instruction in the branch delay slot

Question

I have just started to learn about the concept of branch delay slot on mips.

// $1 contains 1
jal flag
sw $1, 0($2)
...
flag: addi $1, $0, 5

In my understanding, the instruction sw will be executed before the PC jumps to the flag label, which stores the value 1 from $1 to the address in $2. But for expected order, sw will be executed after the flag label and the value being stored to memory is 5.

Do I misunderstand how branch delay works? If no then does it mean we have to consider the impact of branch delay when we write the code?

UPDATE I wrote the assembly code in a "common programming style", so the expected order is :

1. jump to the method flag()
2. execute flag()
3. store the value in $1 to the memory

It's worse than that: your execution may be interrupted by some trap or external event. The interrupt handler can't return to the delay slot because then it wouldn't see the branch at all, so it would fall through on exception return even if the branch condition was true. Branch delay slots are a bad idea conceptually, and small numbers of branch delay slots are also *useless* on microarchitectures with long pipelines. The solution is to a) design & use processor architectures without *that* specific braindamage and b) not to put *anything* into branch delay slots that could affect the branch. — EOF, Apr 15 '21 at 16:21
What do you mean by "expected order"? Either the processor in question has the delay slot or it doesn't -- different expectations would follow from those. The original MIPS processors have the delay slot, but newer one's don't. Some simulators have an option to say whether to include the delay slot or not. RISC V doesn't have a delay slot and never has. — Erik Eidt, Apr 15 '21 at 17:00
@ErikEidt I have updated the question with the explanation to the "expected order". Our prof was talking about the old versions of MIPS which have the branch delay slot` — chaos, Apr 15 '21 at 17:16
When the processor has the delay slot, it is expected for it to execute the instruction in the delay slot before transferring control to the branch target. That is what is expected. And yes, you have to be aware of it when writing code for that processor. Code written for the same processor but without the delay slot will fail on the one with the delay slot and vice versa. — Erik Eidt, Apr 15 '21 at 17:19
@ErikEidt yeah I should say the "expected order" is actually expected from my side. Many thanks to your explanation! — chaos, Apr 15 '21 at 17:21

Peter Cordes · Accepted Answer · 2021-04-15T22:02:15.883

Yes, a branch-delay slot offloads the responsibility of hiding branch latency to the compiler by making it architecturally visible, as you suspect. Why does MIPS use one delay slot instead of two? (- because first-gen MIPS managed to keep branch latency down to 1 cycle).

That's the entire point, and why a future CPU with branch prediction can't just get rid of the delay slot, so MIPS was saddled with it until MIPS32r6 broke backwards binary compat and reorganized the opcodes, introducing branches without delay slots.

As EOF mentioned in comments, a delay slot significantly complicates exception handling, because it's legal to put instructions that might fault into delay slots. For exception return, the CPU needs to know which instruction to run, and an address after that which might or might not be right after it.

Why is the branch delay slot deprecated or obsolete?

The instruction in the delay slot does execute before the code at the branch target address, including its effects on memory or registers. If the delay-slot instruction reads $ra after jal writes it, then yes, the delay-slot instruction sees changes made by the branch itself.

But I think you're asking about the called function, which is a separate question; no, the return address will be the instruction after the delay slot, because the delay-slot instruction already ran right after the j or b itself, while the CPU was fetching the code from the target address; that's the whole point.

So when you're programming for a machine with a delay slot, you need to understand the order of instruction execution, and try to fill the delay slot with something more useful than a NOP. (Although if you only ever use NOP, then your code will run the same on a machine with or without a delay slot. e.g. in MARS if you click the checkbox to toggle between simulating a MIPS with a delay slot or a fake simplified MIPS without.)

Classic MIPS assemblers apparently would try to fill delay slots for you, unless you used .noreorder, as described in See MIPS Run. i.e. they would let your asm source look like your question says you "expect" (or at least want), i.e. without branch delay slots, and look for independent instructions that can be moved without affecting correctness.

(Compiler-generated code would use .noreorder and have to compiler "expect" what's actually going to happen. Humans could use that, too, if they know to expect what the machine will actually do.)

branch delay: What if the branch changes the value that is used by the instruction in the branch delay slot

1 Answers1