MIPS: Division algorithm (Dividing significands of IEEE-754 format) gives incorrect answer for the last 4-5 bits (LSB)

Question

For the case where divisor>dividend, the calculated mantissa gives wrong result for the last 4-5 least significant bits.

I'm trying to divide the significands/mantissa of two IEEE-754 floating point numbers. I've used this division algorithm

When divisor>dividend, the mantissa is normalised but still incorrect for the last 4-5 bits.

DivisionAlgorithm: #Divides Dividend Mantissa (in $a0) by Divisor Mantissa (in $a1) for 25 Iterations# #Returns Quotient Value in $v0#

    addi $sp, $sp, -12      #Decrement in $sp (Stack Pointer)
    sw $s0, 0($sp)      #Pushing $s0 into Stack
    sw $ra, 4($sp)      #Pushing $ra into Stack (Function Call Exists)
    sw $s1,8($sp)       #Pushing $s1 into stack 

    move $t0, $a0       #$t0 = Mantissa of Dividend/Remainder
    move $t1, $a1       #$t1 = Mantissa of Divisor

    add $s0, $0, $0     #$s0 = Initialization
    add $v1, $0, $0     #$v1 = 0 (Displacement of Decimal Point Initialized)
    addi $s1,$0,1       #$s1 = 1 (initialize loop variable to 1)
    add $t3,$0,33

loop:   
    beq $t1, $0, check      #If Divisor = 0, Branch to check
    sub $t0, $t0, $t1       #Dividend = Dividend - Divisor
    sll $s0, $s0, 1     #Quotient Register Shifted Left by 1-bit
    slt $t2, $t0, $0
    bne $t2, $0, else       #If Dividend < 0, Branch to else
    addi $s0, $s0, 1        #Setting Quotient LSb to 1
    j out
else:   add $t0, $t0, $t1       #Restoring Dividend Original Value

out:    srl $t1, $t1, 1     #Divisor Register Shifted Right by 1-bit
    j loop

check:  slt $t2, $a0, $a1       #If Dividend < Divisor, Call Function 'Normalization'
    beq $t2, $0, exit       #If Dividend > Divisor, Branch to exit
    move $a0, $s0       #$a0 = Quotient

    jal Normalization       #Function Call 
    j return

exit:   move $v0, $s0       #$v0 = Calculated Mantissa

return: lw $ra, 4($sp)      #Restoring $ra
    lw $s0, 0($sp)      #Restoring $s0
    lw $s1, 8($sp)      #restoring $s1
    addi $sp, $sp, 8        #Increment in $sp (Stack Pointer)
    jr $ra          #Return

Normalization: #Normalizes the Mantissa (in $a0) and Counts the Decimal Places Moved by Decimal Point# #Returns: # i) $v0 = Normalized Mantissa # # ii) $v1 = Count of Decimal Places #

    lui $t0, 0x40       #$t0 = 0x40 (1 at 23rd-bit)
    addi $t2, $0, 1     #$t2 = 1 (Initialization)

loop2:  and $t1, $a0, $t0       #Extracting 23rd-bit of Mantissa 
    bne $t1, $0, else2      #If 23rd-bit = 1; Branch to else2
    addi $t2, $t2, 1        #Increment in Count of Decimal Places Moved
    sll $a0, $a0, 1     #Mantissa Shifted Left (To Extract Next Bit)
    j loop2         

else2:  sll $a0, $a0, 1     #Setting 24th-bit = 1 (Implied)
    move $v0, $a0       #$v0 = Normalized Mantissa
    move $v1, $t2       #$v1 = Displacement of Decimal Point    
    jr $ra          #Return

For example, I expect the output of 2.75/6.355 to be 00111110110111011000111011001110 but the actual output is 00111110110111011000111011010110.

I wonder if your algorithm is correct. At step 3, when shifting right divisor, you loose precision if you shift out ones. This may explain why you have incorrect values for the least significant bits. You should try with divisor=1.5 (that should give a correct result) and divisor=1.11111...111b which should give a very incorrect result. Besides that your implementation seems good. — Alain Merigot, Jun 26 '19 at 16:55

Alain Merigot · Accepted Answer · 2019-06-26T23:10:05.343

Your algorithm is incorrect.

A proper algorithm for the restoring division would be

qu=0
rem=dividend
repeat N times
  rem = rem - divisor
  if rem < 0
    rem = rem + divisor
    qu = (qu<<1)
  else
    qu = (qu<<1) + 1
  end
  rem = rem << 1
end

while yours is

qu=0
rem=dividend
repeat N times
  rem = rem - divisor
  if rem < 0
    rem = rem + divisor
    qu = (qu<<1)
  else
    qu = (qu<<1) + 1
  end
  divisor = divisor >> 1  // instead of left shift remainder
end

As the algorithm only relies on the comparison of divisor and rem, it seems equivalent to right shift divisor or to left shift rem. But it is not.
When right shifting divisor, you loose the least significant bits of the divisor.
This may affect the comparison and hence the quotient.
If you print the remainder, you will see that there is a significant impact on it and there may a 2x difference in its value vs the correct result.

It seems to be dangerous to multiply by 2 the remainder, as there may be an overflow. But if we look at the algorithm, we can see that it will not happen.
Initially dividend and divisor are the mantissa of some FP, and hence 1 ≤ divisor, dividend < 2 and the same holds for rem that is initially a copy of dividend. Note that this implies that rem<2*div
Now, we do the first computation step
⚫ if rem < div, then we multiply it by 2, and rem(=2*rem)<2*div.
⚫ if rem ≥ div, then we compute rem−div and multiply it by 2.
As initially rem < 2*div, rem(=2*(rem− div))<2*(2*div− div), and the property rem<2*div is still true.

So at every step we always have the property that rem<2*div and, provided 2*div can be coded, this insures that rem will never overflow.

In terms of implementation, you can code these numbers on the 24 LSB of an integer register. It is largely sufficient as the precision remains unchanged.
In your implementation, you loop 32 times. If you want to write the result in an IEEE mantissa, it is useless and the number of iterations can be reduced. It is sufficient to loop
24 times (mantissa size)
+ 1 (in case dividend < divisor, first bit will be 0, but the second bit is garantied to be at 1)

If you want to do some rounding on the result, you must do two additional steps to get the round and guard bits. After these two computation steps, if remainder ≠ 0, set sticky bit to 1, otherwise set it to 0. Then perform the rounding with the usual rules.

Netch · Answer 2 · 2019-06-29T05:51:01.673

I. The description of division algorithm itself in the form you started with (and, as far as I see, so do the assembler code) lacks the principal part. When preparing to run the division cycle, when you shift divisor right by 1 bit on each iteration, you should have initially shifted it left as most as possible and useful.

Let me explain this on decimal numbers: imagine you divide 10001 by 73 in 6-digit machine (so this is 010001 by 000073). You should shift 73 left until either it stops to fit (so maximal shift is 4, and shifted divisor is 730000) or its most significand digit (MSD) position gets higher than dividend's MSD (so, we can stop at 73000). Then, the cycle is run with

shifted_divisor = 73000, shift = 3
shifted_divisor = 7300, shift = 2
shifted_divisor = 730, shift = 1
shifted_divisor = 73, shift = 0

(With decimal, a nested loop of subtraction shall be done for each shift; with binary, there is no need to.)

If you shift 73 right more, you'll lose significant digits in divisor and so final quotient will be greater than exact one.

You can shift left divisor for the maximum width unconditionally (730000, shift=4), but this will waste CPU cycles. If CPU has CountLeadingZeros operation, you can save time with it.

II. The second principal moment is that you are dividing mantissas got from floating-point numbers. This means that, in the most typical case of two normalized numbers, MSB of dividend and divisor will be the same and you'll get only 2 variants of quotient - 0 and 1. (For IEEE binary32, this is that bit 23 is set and both dividend and divisor are in range 8388608..16777215.)

To get more real digits, you have to perform long division. (See below for more economic approach; now, just compare with schoolbook division.) For IEEE binary32, this means getting at least 27 quotient digits (24 of resulting mantissa + guard + round + sticky) before final rounding. Shift dividend left by 27 bits and proceed with division of 51-bit dividend by 24-bit divisor, as described above, in 28 cycles.

The version described by @AlainMerigot is essentially the same as I described - you still calculate (dividend<<quotient_width)/divisor, but in less obvious way. If left-shift the remainder by 1, this is likely the same as right-shift divisor by 1, provided no data are lost. So, first iteration (number 0) compares equal positions and provide MSB of result, but at the moment it is at bit 0; then, you "activate" one more bit of remainder on each iteration. This allows avoiding double-word operations. In this algorithm, if you needed final remainder, it should have been restored by shifting right by iteration count; but floating division itself doesn't need it.

But: for its proper working, you shall have the dividend and divisor properly aligned from the beginning. In decimal example, this means you should start not with divident=10001, divisor=73, but with dividend=10001, divisor=73000. If your source mantissas were normalized before getting integer values, this is automatically satisfied. Otherwise, additional efforts are needed, and they are virtually the same as I described for divisor shifting for schoolbook division. (Looks like that's why Intel CPUs had been suffering from unlimited denormal handling time for decades;))

MIPS: Division algorithm (Dividing significands of IEEE-754 format) gives incorrect answer for the last 4-5 bits (LSB)

2 Answers2

Linked