0

I am wondering if there are any algorithms to perform a floating-point division that would be accelerated if one has access to a floating point square root unit in hardware?

If so, what are those algorithms?

Veridian
  • 3,531
  • 12
  • 46
  • 80
  • 4
    If you had hardware `log` that might help, but it's hard to see how hardware `sqrt` woild help with division. – Paul R Oct 30 '14 at 19:44
  • 1
    This is probably best thought of mathematically--i.e., can you think of a way to express `A/B` in terms of the square root of `A` and/or `B`? – Jerry Coffin Oct 30 '14 at 19:44
  • @ThomasMatthews, for instance if you had a processor with a hardware square root unit, addition/subtraction, and multiplication, but no divider. I was wondering if there was a way to take advantage of having the square root unit. – Veridian Oct 30 '14 at 19:47
  • Have you tried using *fixed-point* math instead of floating point? Integer division is usually faster because you don't have to extract fields out of the format. – Thomas Matthews Oct 30 '14 at 19:47
  • @JerryCoffin, A/B = (sqrt(a/b))^2 = (sqrt(a)/sqrt(b))^2 = (sqrt(a) * 1/sqrt(b))^2, so if you hardware square root unit calculated inverse square roots, then yes, but I'm not sure of another method – Veridian Oct 30 '14 at 19:49
  • 1
    @starbox: Whether you have hardware square root or not, subtraction and maybe inverse multiplication may be faster. Think about the steps required for "6 / 3". While your code is setting up and invoking a hardware square root device, mine performs the required subtractions and finishes first. – Thomas Matthews Oct 30 '14 at 19:50
  • 1
    @ThomasMatthews: You'd be surprised at the difference between integer and floating point division. For FP, the exponents can be subtracted in parallel with the division of the mantissa's. And dividing the mantissa's is actually easier because they're both 24 bits, with first bit set. – MSalters Oct 30 '14 at 23:01
  • 1
    @starbox if it would be `sqr` unit instead of `sqrt` then you can use this http://stackoverflow.com/a/18398246/2521214 for sqrt there can exist some insane math series approximating division but to use more complex unit in general to achieve simpler task is waste of clock and energy nor the mention the gates needed to get it work ... heh now I see that it is your question :) – Spektre Oct 31 '14 at 11:46

1 Answers1

1

Division by square root is actually the way division operation is usually implemented in hardware. To be more precise the square root unit almost universally internally computes reciprocal square root (1/sqrt(X) ), because with it one can easily perform both division and square root operations: sqrt(x) = x*(1/sqrt(x)) and R=X/Y=X*Z*Z where Z=1/sqrt(Y).

If there is a hardware instruction that returns an estimate, you can improve the result by the following iterative method:

Z = Z * (3-Y*Z*Z )/2

p12
  • 1,161
  • 8
  • 23
  • 2
    I cannot find **any** proof for this. Wikipedia is fairly clear about older Intel and AMD (definitely not like this), Itanium definitely doesn't work like this, and in general this doesn't get you the necessary precision: IEEE754 demands correct division (<1 ulp). Note that "hardware efficiency" is _not_ a relevant concern. Gates are cheap, power wasted by inefficient implementations is a problem. And that applies equally to the 120W monster CPU's as to 120 mW embedded SoC's. – MSalters Oct 30 '14 at 20:44
  • Newton Raphson division does not try to find the sqaure root of the reciprocal. http://en.wikipedia.org/wiki/Division_algorithm#Newton.E2.80.93Raphson_division Also, can you show your work? I worked out the newton raphson iteration for square root of reciprocal and it doesn't agree with your formula... – Maxaon3000 Oct 30 '14 at 21:20
  • 2
    Never mind, I worked out the same formula. I just don't think it's used anywhere because you can solve for 1/y using Newton raphson directly instead of solving for the square root. – Maxaon3000 Oct 30 '14 at 21:31
  • @Maxaon3000: If one only has a hardware for certain number of "estimation table" entries, having using one table to approximate 1/sqrt(x) may yield better performance than having one table for 1/x and another for 1/sqrt(x) but only having half as many entries in each. – supercat Nov 03 '14 at 22:11