5

On modern processors, float division is a good order of magnitude slower than float multiplication (when measured by reciprocal throughput).

I'm wondering if there are any algorithms out there for computating a fast approximation to x/y, given certain assumptions and tolerance levels. For example, if you assume that 0<x<y, and are willing to accept any output that is within 10% of the true value, are there algorithms faster than the built-in FDIV operation?

dshin
  • 2,354
  • 19
  • 29
  • 1
    Have you considered changing the structure of your algorithm to avoid division, rather trying to find a faster division technique? – Apples Jun 24 '15 at 15:53
  • An error of 10% per division can lead to exponential errors when reused, is that worth it? possibly not – WorldSEnder Jun 24 '15 at 15:53
  • This is purely for intellectual curiosity with no planned application. – dshin Jun 24 '15 at 15:57
  • x * 1/y is faster if you can precompute the reciprocal. –  Jun 24 '15 at 16:01
  • Duplicate of [this](http://stackoverflow.com/q/12227126/555045)? Well, I don't know if it's faster. It's something. – harold Jun 24 '15 at 16:03
  • 1) mask out the fraction bits of the divisor, 2) take its 8 most significant bits, 3) look up for its reciprocal, 4) multiply the reciprocal with the dividend, 5) add the exponent of the divisor to the exponent part of the product. – user3528438 Jun 24 '15 at 16:14
  • Of course if you can use a built-in instruction equivalent to `rcpss` or `pfrcp`, that's probably best. – harold Jun 24 '15 at 16:27
  • you can use `RCPSS` instruction [Fast 1/X division (reciprocal)](http://stackoverflow.com/q/9939322/995714) – phuclv Jun 24 '15 at 16:44

2 Answers2

3

I hope that this helps because this is probably as close as your going to get to what you are looking for.

__inline__ double __attribute__((const)) divide( double y, double x ) {
                                    // calculates y/x
    union {
        double dbl;
        unsigned long long ull;
    } u;
    u.dbl = x;                      // x = x
    u.ull = ( 0xbfcdd6a18f6a6f52ULL - u.ull ) >> (unsigned char)1;
                                    // pow( x, -0.5 )
    u.dbl *= u.dbl;                 // pow( pow(x,-0.5), 2 ) = pow( x, -1 ) = 1.0/x
    return u.dbl * y;               // (1.0/x) * y = y/x
}


See also:
Another post about reciprocal approximation.
The Wikipedia page.

Community
  • 1
  • 1
Jack G
  • 4,553
  • 2
  • 41
  • 50
1

FDIV is usually exceptionally slower than FMUL just b/c it can't be piped like multiplication and requires multiple clk cycles for iterative convergence HW seeking process.

Easiest way is to simply recognize that division is nothing more than the multiplication of the dividend y and the inverse of the divisor x. The not so straight forward part is remembering a float value x = m * 2 ^ e & its inverse x^-1 = (1/m)*2^(-e) = (2/m)*2^(-e-1) = p * 2^q approximating this new mantissa p = 2/m = 3-x, for 1<=m<2. This gives a rough piece-wise linear approximation of the inverse function, however we can do a lot better by using an iterative Newton Root Finding Method to improve that approximation.

let w = f(x) = 1/x, the inverse of this function f(x) is found by solving for x in terms of w or x = f^(-1)(w) = 1/w. To improve the output with the root finding method we must first create a function whose zero reflects the desired output, i.e. g(w) = 1/w - x, d/dw(g(w)) = -1/w^2.

w[n+1]= w[n] - g(w[n])/g'(w[n]) = w[n] + w[n]^2 * (1/w[n] - x) = w[n] * (2 - x*w[n])

w[n+1] = w[n] * (2 - x*w[n]), when w[n]=1/x, w[n+1]=1/x*(2-x*1/x)=1/x

These components then add to get the final piece of code:

float inv_fast(float x) {
    union { float f; int i; } v;
    float w, sx;
    int m;

    sx = (x < 0) ? -1:1;
    x = sx * x;

    v.i = (int)(0x7EF127EA - *(uint32_t *)&x);
    w = x * v.f;

    // Efficient Iterative Approximation Improvement in horner polynomial form.
    v.f = v.f * (2 - w);     // Single iteration, Err = -3.36e-3 * 2^(-flr(log2(x)))
    // v.f = v.f * ( 4 + w * (-6 + w * (4 - w)));  // Second iteration, Err = -1.13e-5 * 2^(-flr(log2(x)))
    // v.f = v.f * (8 + w * (-28 + w * (56 + w * (-70 + w *(56 + w * (-28 + w * (8 - w)))))));  // Third Iteration, Err = +-6.8e-8 *  2^(-flr(log2(x)))

    return v.f * sx;
}
nimig18
  • 797
  • 7
  • 10