Floating Point Divider Hardware Implementation Details

Question

I am trying to implement a 32-bit floating point hardware divider in hardware and I am wondering if I can get any suggestions as to some tradeoffs between different algorithms?

My floating point unit currently suppports multiplication and addition/subtraction, but I am not going to switch it to a fused multiply-add (FMA) floating point architecture since this is an embedded platform where I am trying to minimize area usage.

do you have a constraint on how many cycles that it can take? — alex_milhouse, Jun 14 '13 at 00:28
Does the implementation have to be IEEE-754 compliant? What kind of hardware is this, ASIC, FPGA, something else? Have you consulted the relevant literature, starting with the papers by S. Oberman and P. Soderquist in the 1990s? — njuffa, Jun 14 '13 at 03:08
I do not have a constraint on cycle count. I would like the area to be low. My adder is 1 cycle, my multiplier is 1 cycle. This is for an ASIC. I have consulted some literature, but I haven't found anything good yet, that is why I am asking. — Veridian, Jun 14 '13 at 03:25
Use of the Newton-Raphson iteration for the reciprocal, followed by a back multiply will allow re-use of the multiplier and adder and thus minimize additional hardware, but without FMA I would expect producing a correctly rounded division to be a hard problem. Since you do not care much about latency and to minimize area, you could get the NR-iteration started with a trivial approximation for the reciprocal that provides just 3.5 "good" bits: http://perso.ens-lyon.fr/jean-michel.muller/EMTAsilomar05.pdf — njuffa, Jun 14 '13 at 04:33
Really? There is no way to provide exactly rounded results without FMA? I am trying to compare architectures with and without a divider. I already have a software Newton-raphson implementation using just multiplies and add/subs. — Veridian, Jun 14 '13 at 16:47
There is some relevant code in this answer: http://stackoverflow.com/a/2699886/80448 — Eric Bainville, Jun 14 '13 at 20:21
Re: correct rounding: "hard problem" != "no way to provide exactly rounded results without FMA". Search for "Tuckerman rounding", its use precedes FMA. — njuffa, Jun 14 '13 at 21:21

Spektre · Answer 1 · 2018-08-04T09:12:05.923

Once upon a very long time ago i come across this neat and easy to implement float/fixed point divison algorithm used in military FPUs of that time period:

input must be unsigned and shifted so x < y and both are in range < 0.5 ; 1 >

don't forget to store the difference of shifts sh = shx - shy and original signs
find f (by iterating) so y*f -> 1 .... after that x*f -> x/y which is the division result
shift the x*f back by sh and restore result sign (sig=sigx*sigy)

the x*f can be computed easily like this:
```
z=1-y
(x*f)=(x/y)=x*(1+z)*(1+z^2)*(1+z^4)*(1+z^8)*(1+z^16)...(1+z^2n)
```
where
```
n = log2(num of fractional bits for fixed point, or mantisa bit size for floating point)
```
You can also stop when z^2n is zero on fixed bit width data types.

[Edit2] Had a bit of time&mood for this so here 32 bit IEEE 754 C++ implementation

I removed the old (bignum) examples to avoid confusion for future readers (they are still accessible in edit history if needed)

//---------------------------------------------------------------------------
// IEEE 754 single masks
const DWORD _f32_sig    =0x80000000;    // sign
const DWORD _f32_exp    =0x7F800000;    // exponent
const DWORD _f32_exp_sig=0x40000000;    // exponent sign
const DWORD _f32_exp_bia=0x3F800000;    // exponent bias
const DWORD _f32_exp_lsb=0x00800000;    // exponent LSB
const DWORD _f32_exp_pos=        23;    // exponent LSB bit position
const DWORD _f32_man    =0x007FFFFF;    // mantisa
const DWORD _f32_man_msb=0x00400000;    // mantisa MSB
const DWORD _f32_man_bits=       23;    // mantisa bits
//---------------------------------------------------------------------------
float f32_div(float x,float y)
    {
    union _f32          // float bits access
        {
        float f;        // 32bit floating point
        DWORD u;        // 32 bit uint
        };
    _f32 xx,yy,zz; int sh; DWORD zsig; float z;
    //      result signum        abs value
    xx.f=x; zsig =xx.u&_f32_sig; xx.u&=(0xFFFFFFFF^_f32_sig);
    yy.f=y; zsig^=yy.u&_f32_sig; yy.u&=(0xFFFFFFFF^_f32_sig);
    // initial exponent difference sh and normalize exponents to speed up shift in range
    sh =0;
    sh-=((xx.u&_f32_exp)>>_f32_exp_pos)-(_f32_exp_bia>>_f32_exp_pos); xx.u&=(0xFFFFFFFF^_f32_exp); xx.u|=_f32_exp_bia;
    sh+=((yy.u&_f32_exp)>>_f32_exp_pos)-(_f32_exp_bia>>_f32_exp_pos); yy.u&=(0xFFFFFFFF^_f32_exp); yy.u|=_f32_exp_bia;
    // shift input in range
    while (xx.f> 1.0f) { xx.f*=0.5f; sh--; }
    while (xx.f< 0.5f) { xx.f*=2.0f; sh++; }
    while (yy.f> 1.0f) { yy.f*=0.5f; sh++; }
    while (yy.f< 0.5f) { yy.f*=2.0f; sh--; }
    while (xx.f<=yy.f) { yy.f*=0.5f; sh++; }
    // divider block
    z=(1.0f-yy.f);
    zz.f=xx.f*(1.0f+z);
    for (;;)
        {
        z*=z; if (z==0.0f) break;
        zz.f*=(1.0f+z);
        }
    // shift result back
    for (;sh>0;) { sh--; zz.f*=0.5f; }
    for (;sh<0;) { sh++; zz.f*=2.0f; }
    // set signum
    zz.u&=(0xFFFFFFFF^_f32_sig);
    zz.u|=zsig;
    return zz.f;
    }
//---------------------------------------------------------------------------

I wanted to keep it simple so it is not optimized yet. You can for example replace all *=0.5 and *=2.0 by exponent inc/dec ... If you compare with FPU results on float operator / this will be a bit less precise because most FPUs compute on 80 bit internal format and this implementation is only on 32 bits.

As you can see I am using from FPU just +,-,*. The stuff can be speed up by using fast sqr algorithms like

Fast bignum square computation

especially if you want to use big bit widths ...

Do not forget to implement normalization and or overflow/underflow correction.

Thank you for your answer but I really can't understand your code. You wrote a division, but you are using the divider block lol? — Veridian, Aug 23 '13 at 11:01
the divider block is this z=1-y; (x*f)=(x/y)=x*(1+z)*(1+z^2)*(1+z^4)*(1+z^8)*(1+z^16)...(1+z^2n) so for division you need only + and * ,... terms z^2n are easily done in loop by z=z*z .... — Spektre, Aug 23 '13 at 11:15
oh and i see now the confusion in divider block x,y are the variables u,v from the main divison routine — Spektre, Aug 23 '13 at 11:16
so instead of w.div(u,v) should be z=1-v w=u*(1+z)*(1+z^2)*(1+z^4)*(1+z^8)*(1+z^16)...(1+z^2n) — Spektre, Aug 23 '13 at 11:18
if you still dont understand i can try to scatch a block diagram ,... but have no clue how to post it here (i am new to stack-overflow) — Spektre, Aug 23 '13 at 11:24
ok ive edited my answer. added my floating point implementation with some explanations of what does what — Spektre, Aug 29 '13 at 12:15
@Veridian I added fully working C++ example for IEEE 754 32bit float (unlike my previous examples it does not depend on any other stuff) — Spektre, Aug 04 '18 at 08:55

Floating Point Divider Hardware Implementation Details

1 Answers1

Linked