How does this sqrt approximation inline assembly function work?

Question

Reading through The Tricks of the 3D Game Programming Gurus, I came across this sort function written in inline assembly:

inline float FastSqrt(float Value)
{
    float Result;

    _asm
    {
        mov eax, Value
        sub eax, 0x3F800000
        sar eax, 1
        add eax, 0x3F800000
        mov Result, eax
    }

    return(Result);
}

It is an approximation of the actual square-root, but the accuracy is enough for my needs.

How does this actually work? What is this magical 0x3F800000 value? How are we achieving square root by subtracting, rotating and adding?

Here's how it looks in C/C++ code:

inline float FastSqrt_C(float Value)
{
    float Result;

    long Magic = *((long *)&Value);
    Magic -= 0x3F800000;
    Magic >>= 1;
    Magic += 0x3F800000;
    Result = *((float *)&Magic);

    return(Result);
}

Interesting, which is why I guess I get incorrect results when I change the parameter Value to be an int? Looks like the function only works for floats? — vexe, Jan 21 '17 at 23:00
More importantly, that's the exponent bias. So it cancels the bias, halves the exponent then adds the bias back. It also messes with the mantissa a little. — Jester, Jan 21 '17 at 23:05
@vexe For the general principle involved, you may want to look at [this question](http://stackoverflow.com/questions/32042673/optimized-low-accuracy-approximation-to-rootnx-n) — njuffa, Jan 27 '17 at 20:37

Jester · Accepted Answer · 2017-01-22T00:45:35.103

Many people have pointed out that 0x3f800000 is the representation of 1.0. While this is true, it has nothing to do with how the calculation works. To understand it, you need to know how non-negative floats are stored. f = (1+m)*2^x, with 0 <= m < 1 and m being the mantissa, x the exponent. Also note that x is stored with a bias, so what's actually in the binary is x+127. The 32 bit value is made up from the sign bit (which is zero in our case) followed by 8 bits of exponent storing x+127 and finally by 23 bits of mantissa, m. (See the wikipedia article).

Apply some basic math,

sqrt(f) = sqrt((1+m)*2^x)
        = sqrt(1+m)*sqrt(2^x)
        = sqrt(1+m)*2^(x/2)

So, as a rough approximation, we need to halve the exponent but due to the bias we can't just do x/2 we need (x-127)/2 + 127. This 127 shifted to the appropriate bit position is the magic 0x3f800000.

The division by 2 is achieved using a right shift by one bit. Since this operates on the whole float, it has a side effect on the mantissa too.

First, assume the original exponent was even. Then the least significant bit that gets shifted out is zero. Thus, the mantissa gets halved too, so what we end up is: sqrt(f) = (1+m/2)*2^(x/2). We got the exponent correct, but the mantissa is (1+m/2) instead of sqrt(1+m). The maximum relative error for this is (1.5 - sqrt(2))/sqrt(2) ~ 6% which occurs if m is almost 1 meaning f is near, but less than an odd power of 2. Take for example f=7.99. The formula gives us about 2.998 instead of 2.827 which indeed has an error of 6%.

Now, if the exponent was odd, then the least significant bit will be 1 and this when shifted into the mantissa will cause an increase by half. Thus, we get sqrt(f) = (1.5+m/2)*2^((x-1)/2). The maximum error for this is actually when m=0, and that will be (1.5/sqrt(2)-sqrt(1))/sqrt(1) which is again around 6%. This occurs for numbers that are near to an odd power of two from above.

The two cases combined mean the worst inaccuracy is around 6% if the input value happens to be near an odd power of two. For even powers of two, the result is accurate.

score 0 · Answer 2 · answered Jan 21 '17 at 23:09

0

0x3F800000 in float is 1. This is because of the way floats are stored. You can see a visual representation at https://gregstoll.dyndns.org/~gregstoll/floattohex/.

This is a good approximation, I believe of sqrt. The origin of this is from a game Quake for inverse sqrt (https://en.wikipedia.org/wiki/Fast_inverse_square_root#Aliasing_from_floating_point_to_integer_and_back).

answered Jan 21 '17 at 23:09

D Summy

9
2

If you sketch the graphs for y=(x+1)/2 and y=sqrt(x) you see that they are close when x in [1,2]. So my guess it it's an approximation for values that lie within that interval. – Jan 21 '17 at 23:14
@Roadowl It does **not** calculate `(x+1)/2` – Jester Jan 21 '17 at 23:17
1

The Wikipedia article you link discusses an entirely different method for approximating a square root than the code presented in the question. – Cody Gray - on strike Jan 22 '17 at 16:52

score 0 · Answer 3 · answered Jan 21 '17 at 23:25

0

Here's an example of the mechanics of this in action:

FastSqrt(4.0) == 2.0

4.0 to hex -> 0x40800000
0x40800000 - 0x3f800000 = 0x1000000
0x1000000 to binary -> 00000001 00000000 00000000 00000000
shift toward the lsb (sar) -> 00000000 10000000 00000000 00000000
00000000 10000000 00000000 00000000 back to hex -> 0x00800000
0x00800000 + 0x3f800000 = 0x40000000
0x40000000 to dec -> 2.0

answered Jan 21 '17 at 23:25

mbil

106
6

This would be better if you also showed the exponent / mantissa fields for each step, not *just* the whole binary32 bit pattern. – Peter Cordes Apr 21 '18 at 03:28

Gaiger Chen · Answer 4 · 2018-04-20T20:43:12.630

Floating number f = (1 + m)* [2^(e+127)], where m is mantissa part, e is exponential part.

thus : sqrt(f) = (f)^(1/2) = ((1 + m)* [2^(e+127)] )^(1/2)

-> ((1 + m)* [2^(e+127)] )^(1/2) = (1 + m)^(1/2) * 2^((e + 127)/2)

In the part of exponent, 2^((e + 127)/2):

2^((e + 127)/2) = 2^( (e-127/2) + 127)

thus, in floating representation, it is (e - 0x3F800000) /2 + 0x3F800000

In the part of mantissa, (1 + m)^(1/2):

from binomial series formula, (1 + x)^r = 1 + rx + (r(r - 1)/2)*(x^2) +....

Thus, (1 + m)^(1/2) equals (1 + m/2 - (m^2)/8 + ...) it APPROXIMATELY equals 1 + m/2 (a typical approximation to first order) Therefore, The mantissa part should be divided with 2.

However, The mantissa and exponent are combined as A number, The right shift divides exponent and mantissa BOTH.

To evaluate the error, you could consider the second terms of binomial series, - (m^2)/8.

For the m is little than 1 always, I substitute m as 0.9999 (0.5 + 0.25 + 0.125 + ...)

(m^2)/8 = 0.12497500125, it is the worst case.

How does this sqrt approximation inline assembly function work?

4 Answers4