Fast approximate float division

Question

On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).

I'm wondering if there are any algorithms out there for computating a fast approximation to x/y, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?

Have you considered changing the structure of your algorithm to avoid division, rather trying to find a faster division technique? — Apples, Jun 24 '15 at 15:53
An error of 10% per division can lead to exponential errors when reused, is that worth it? possibly not — WorldSEnder, Jun 24 '15 at 15:53
This is purely for intellectual curiosity with no planned application. — dshin, Jun 24 '15 at 15:57
Duplicate of [this](http://stackoverflow.com/q/12227126/555045)? Well, I don't know if it's faster. It's something. — harold, Jun 24 '15 at 16:03
1) mask out the fraction bits of the divisor, 2) take its 8 most significant bits, 3) look up for its reciprocal, 4) multiply the reciprocal with the dividend, 5) add the exponent of the divisor to the exponent part of the product. — user3528438, Jun 24 '15 at 16:14
Of course if you can use a built-in instruction equivalent to `rcpss` or `pfrcp`, that's probably best. — harold, Jun 24 '15 at 16:27
you can use `RCPSS` instruction [Fast 1/X division (reciprocal)](http://stackoverflow.com/q/9939322/995714) — phuclv, Jun 24 '15 at 16:44

score 3 · Answer 1 · edited May 23 '17 at 10:30

I hope that this helps because this is probably as close as your going to get to what you are looking for.

__inline__ double __attribute__((const)) divide( double y, double x ) {
                                    // calculates y/x
    union {
        double dbl;
        unsigned long long ull;
    } u;
    u.dbl = x;                      // x = x
    u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
                                    // pow( x, -0.5 )
    u.dbl *= u.dbl;                 // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
    return u.dbl * y;               // (1.0/x) * y = y/x
}

See also:
Another post about reciprocal approximation.
The Wikipedia page.

score 1 · Answer 2 · answered Apr 06 '17 at 05:23

FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.

Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y and the inverse of the divisor x. The not so straight forward part is remembering a float value x = m * 2 ^ e & its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q approximating this new mantissa p = 2/m = 3-x, for 1<=m<2. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.

let w = f(x) = 1/x, the inverse of this function f(x) is found by solving for x in terms of w or x = f^(-1)(w) = 1/w. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2.

w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])

w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x

These components then add to get the final piece of code:

float inv_fast(float x) {
    union { float f; int i; } v;
    float w, sx;
    int m;

    sx = (x < 0) ? -1:1;
    x = sx * x;

    v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
    w = x * v.f;

    // Efficient Iterative Approximation Improvement in horner polynomial form.
    v.f = v.f * (2 - w);     // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
    // v.f = v.f * ( 4 + w * (-6 + w * (4 - w)));  // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
    // v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w)))))));  // Third Iteration, Err = +-6.8e-8 *  2^(-flr(log2(x)))

    return v.f * sx;
}

Fast approximate float division

2 Answers2