6

When implementing "Carmack's Inverse Square Root" algorithm I noticed that the results seem biased. The following code seems to give better results:

float InvSqrtF(float x)
{
    // Initial approximation by Greg Walsh.
    int i  = * ( int* ) &x;
    i  = 0x5f3759df - ( i >> 1 );
    float y  = * ( float * ) &i;
    // Two iterations of Newton-Raphson's method to refine the initial estimate.
    x *= 0.5f;
    float f = 1.5F;
    y  = y * ( f - ( x * y * y ) );
    y  = y * ( f - ( x * y * y ) );
    * ( int * )(&y) += 0x13; // More magic.
    return y;
}

The key difference is in the penultimate "more magic" line. Since the initial results were too low by a fairly constant factor, this adds 19 * 2^(exponent(y)-bias) to the result with just a single instruction. It seems to give me about 3 extra bits, but am I overlooking something?

Community
  • 1
  • 1
MSalters
  • 173,980
  • 10
  • 155
  • 350
  • 2
    Exactly what is your question? "Is 19 the right number to add?" or something else? – Mats Petersson Feb 07 '13 at 10:35
  • 2
    Isn't this code doomed to undefined behaviour by the strict aliasing rule? – comocomocomocomo Feb 07 '13 at 10:58
  • @MatsPetersson: The first part of algorithm dates back to the eighties, and there's been a lot of investigation in better magic constants for the first approximation. A lot of people have looked at it before. So I don't trust myself 100% here. Am I overlooking something that they saw? That's more likely than all of them overlooking this improvement. – MSalters Feb 07 '13 at 11:00
  • Possibly that John Carmack didn't need three extra bits? – Bo Persson Feb 07 '13 at 11:00
  • @comocomocomocomo: It already was at the first line; my addition works iff that works. – MSalters Feb 07 '13 at 11:01
  • @BoPersson: He in fact didn't, since he even left the second Newton-Raphson's iteration commented out. – MSalters Feb 07 '13 at 11:02
  • Is this actually faster than RSQRTSS (SSE reciprocal square root)? – Mats Petersson Feb 07 '13 at 11:10
  • @MSalters: Sorry, I didn't intend to critisize you nor the original coders. I am sincerely worried about the effects of the strict aliasing rules. I would like to know how to do this kind of tricks safely (accessing the bits of a `float` as if it were an `int`...) in compilers that apply these rules. – comocomocomocomo Feb 07 '13 at 11:12
  • 2
    @comocomocomocomo You can safely type-pun with a union. – Daniel Fischer Feb 07 '13 at 11:15
  • 1
    @MatsPetersson: No idea, target is ARM. – MSalters Feb 07 '13 at 11:31
  • 1
    Ah, OK - then I have no further to add to this... And I'm not sure type punning is ever "safe" - you are in the hands of the compiler doing the right thing - someone said in another thread that "you can only have one active member in a union at any time", so type punning through union seems to be off as well - or I've misunderstood that comment. – Mats Petersson Feb 07 '13 at 11:36
  • I've just tried with gcc 4.6.3. If I type-pun with a union (no pointers involved at all) it works and I don't get any warnings. If I type-pun with pointers I get a warning, though it also works. (I'm using -Wall, of course). Thanks @Daniel Fischer – comocomocomocomo Feb 07 '13 at 11:56
  • 1
    @MatsPetersson N1570, 6.5.2.3 (3), footnote 95: "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation." So apart from trap representations, type punning via unions is safe per the standard [yes, footnotes are not normative, but it's crystal clear that it's intended to work]. – Daniel Fischer Feb 07 '13 at 13:07
  • @MSalters: I wrote a union-based version of the code of the question. Would you like me to add it somehow? I might add it to the question, or as an answer (though it's not a proper answer to your question). – comocomocomocomo Feb 07 '13 at 16:08
  • @comocomocomocomo: Thanks but not needed. For me the real worry was about mathematical difficulties. – MSalters Feb 08 '13 at 08:48

2 Answers2

3

Newton's method produces a bias. The function whose zero is to be found,

f(y) = x - 1/y²

is concave, so - unless you start with an y ≥ √(3/x) - the Newton method only produces approximations ≤ 1/√x (and strictly smaller, unless you start with the exact result) with exact arithmetic.

Floating point arithmetic occasionally produces too large approximations, but typically not in the first two iterations (since the initial guess usually isn't close enough).

So yes, there is a bias, and adding a small quantity generally improves the result. But not always. In the region around 1.25 or 0.85 for example, the results without the adjustment are better than with. In other regions, the adjustment yields one bit of additional precision, in yet others more.

In any case, the magic constant to add should be adjusted to the region from which x is most often taken for the best results.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
2

As this method is an approximation, the result will be overestimated some times and underestimated some others. You can find on McEniry's paper some nice figures about how this error is distributed for different configurations, and the math behind them.

So, unless you have solid proofs that in your domain of application the result is clearly biased, I would prefer tuning the magic constant as suggested in Lomont's document :-)

dunadar
  • 1,745
  • 12
  • 30