2

I'm was writing some MIPS code for college, to see how functions within functions work, and everything worked fine at first. I'm using the WinMIP64 simulator.

Then, after I turned BTB on, everything was breaking (it got stuck in an infinite loop in the second function).

I was going crazy until realized it was because of BTB (there was a b in one of the functions and I wanted to reduce some of the Branch Taken Stalls that appeared as a result). When I switched it off, everything worked fine again.

I include some of the code below.

.data
tabla:  .byte 1,4,5
res:    .space 3 
cont:   .word 3
num:    .word 0

.text

       daddi $a0, $0, tabla  # offset element table
       daddi $a1, $0, res    # offset results table
       lb $a2, cont($0)      # $a2 = 3 (array size)
       daddi $sp, $0, 0x400  # $sp = 0x400
       jal dobles            # $ra = 0x14
       sd $v1, num($0)       # offset element count
       halt

dobles:                      #first function 

       daddi $sp, $sp, -8    # make space in stack $sp = 0x3f8
       sd $ra, 0($sp)        # 0x3f8 = $ra (0x14) 

loop:
       lb $s0, 0($a0)        # saving element from table in $s0
       daddi $a0, $a0, 1     # add 1 byte displacement to $a0  

       daddi $sp, $sp, -8    # $sp = 0x3f0
       sd $s0, 0($sp)        # 0x3f0 = tabla element 

       jal multi             # $ra = 0x38 
       sb $v0, 0($a1)        # saving result to res
       daddi $a1, $a1, 1     # displacement + 1 byte 

       daddi $a2, $a2, -1    # counter -1 
       bnez $a2, loop        # loop till counter is 0 

       ld $ra, 0($sp)        # load $ra from stack
       daddi $sp, $sp, 8     
       jr $ra 

multi:                       # second function
       ld $t0, 0($sp)        # load element from stack
       daddi $sp, $sp, 8     
       daddi $v1, $v1, 1     # count numer of elements
       dadd $v0, $t0, $t0    # element * 2
       jr $ra 

Why does this happen? Does the call to a function have some sort of effect on the buffer (I thought it was just for branches)? Is it possible to have calls to functions within functions and not have problems if I have the BTB on? What do I need to change if I want to use BTB and function calls within functions?

This was not covered in our program, so I am asking here.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
BasilBrush
  • 90
  • 9
  • 1
    Are you sure you mean the BTB and not branch delay slots? How do you enable/disable that? The code you showed doesn't even have a `b` instruction, how does it break? Post a [mcve]. – Jester Nov 11 '19 at 00:53
  • Hi, yes, definitely sure. A delay slot just allows me to place an instruction after the branch. I am using WinMIPS64, it is an option to switch on BTB. See line 32 for the b: `bnez $a2, loop` – BasilBrush Nov 11 '19 at 00:57
  • 1
    BTB is a branch-prediction structure. It has zero effect on correctness, only performance. It's not architecturally visible. Very likely Jester is right and you actually enabled the branch *delay* slot: the instruction after a `b` or `j` instruction runs whether or not the branch is taken, hiding branch latency on early MIPS (short in-order pipeline, not superscalar). https://en.wikipedia.org/wiki/Delay_slot#Branch_delay_slots. As always, **single-step your code in the debugger to see how it executes.** – Peter Cordes Nov 11 '19 at 00:59
  • Also, are you using MARS? It has an option to enable/disable simulating a MIPS with branch delay slots, defaulting to disabled. – Peter Cordes Nov 11 '19 at 01:04
  • So you mean to say that the `j` also functions like a `b` for MIPS architecture? When I single-step my code, with BTB switched on (I promise, this is possible for WinMIPS64), the $ra increases infinitely, and the second function gets stuck in an infinite loop (this is why I comented the code with return address values). The only thing that worked was to switch off the BTB, in which case everything works as expected. – BasilBrush Nov 11 '19 at 01:05
  • Just checked, indeed BTB and delay slot seem to be separate options in winmips64. – Jester Nov 11 '19 at 01:06
  • 1
    The BTB seems to disable the delay slot (the option gets grayed out and the code executes accordingly). Single stepping showed that the second execution of `jal multi` sets `$ra` to `multi`. Can't imagine any reason for that except a bug in winmips64. – Jester Nov 11 '19 at 01:23
  • @Jester I don't see any way for `$ra = multi` to happen in this code, with/without a branch delay slot. A clearer [mcve] (showing register values and where exactly the infinite loop was) would have let us rule out other possibilities sooner; emulator bugs are usually rare (because most people use MARS or SPIM) and can only be considered after ruling out normal bugs. – Peter Cordes Nov 11 '19 at 01:29
  • I am sorry, I am not sure how to reproduce this error with a smaller amount of code. – BasilBrush Nov 11 '19 at 01:36

1 Answers1

2

BTB is a branch-prediction structure. It has zero effect on correctness, only performance. It's not architecturally visible.

My guess was the same a Jester's: you actually (also?) enabled the architectural branch delay slot: the instruction after a bXX or j/jXX instruction runs whether or not the branch is taken, hiding branch latency on early MIPS (short in-order pipeline, not superscalar).

But actually I don't see anything in your code that would break with or without a branch-delay slot. Jester tested and found that jal multi sets $ra to multi on the second execution; that's an emulator bug. No correct execution of your code can set $ra that way, with or without branch-delay slots.


According to the WinMIPS64 page

A delay slot can be implemented if desired. With V1.30 a simple branch-target-buffer can also be simulated. A << in the code window beside a jump or branch instruction indicates that it is predicted as being taken.

Perhaps the GUI ties together the BTB and delay-slot options?

As always, single-step your code in the debugger to see how it executes.

If you're sure that WinMIP64 is simulating with BTB but without branch-delay slots, then possibly you've found a bug in WinMIP64 itself. Since the BTB (and branch prediction in general) is not architecturally visible1, you code must run the same with or without it.

(Unless you did something that the MIPS ISA allows to cause "unpredictable behaviour", like putting two branches back to back, or modifying the inputs of a mult instruction within a couple instructions after executing it. Or for classic MIPS I, using the result of a load too early, in the load delay slot.)

Footnote 1: outside of Spectre: using a side channel to make microarchitectural state architecturally visible.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Hi, in winMIPS64, you cannot activate both of these options. One automatically disables the other. It is not a good idea to run delay slot in this code. However, I agree that it seems to be having a similar effect when single-stepping the code. – BasilBrush Nov 11 '19 at 01:20
  • if it is as you say and it is a bug in winMIPS64, should I do something about it? I mean, inform them or something? – BasilBrush Nov 11 '19 at 01:30
  • 1
    @BasilBrush: Yeah, if there's an email address or some online way to file a bug report, hopefully the authors would appreciate it if they're still active. If not, then hopefully this SO Q&A will at least help future readers who search on the same problem. – Peter Cordes Nov 11 '19 at 01:33
  • Thanks. I have written them an email linking this post. – BasilBrush Nov 11 '19 at 04:41
  • I tested several scenarios, and it does indeed seem to be a bug with BTB. It does seem rather peculiar as I have used BTB before (without functions which call other functions, to be fair), without a glitch. I will commment what I tried below. – BasilBrush Nov 11 '19 at 13:13
  • 1
    I re-ordered the instruction set, as I would if I was using delay slot, placing a non-dependant instruction below the `bnez`. With delay slot, such an instruction would run, rather than be discarded. However, the address of `multi` was loaded into `$ra` regardless. It *should* only ever load $ra with the address back to `doble` or the main. The address to `multi` is not in the stack/ in the code where it could be loaded accidentally. I tried running the program with delay slot, with and without, re-ordered instructions and it worked OK. So it is a bug with BTB in this particular scenario. – BasilBrush Nov 11 '19 at 13:14
  • @BasilBrush: yup, clearly a WinMIPS64 bug. As I said in my answer, there's no way that `$ra = multi` could ever happen in your code. – Peter Cordes Nov 11 '19 at 13:19