The fast inverse square function used by SGI/3dfx and most notably in Quake is often cited as being faster than the assembly instruction equivalent, however the posts claiming that seem quite dated. I was curious about its performance on more modern hardware, and particularly on mobile devices like the iPhone. I wouldn't be surprised if the Quake sqrt is not longer a worthwhile optimization on desktop systems, but how about for an iPhone project involving a lot of 3D math? Is it something that would be worthwhile to include?
Asked
Active
Viewed 1,807 times
6

Stephen Canon
- 103,815
- 19
- 183
- 269

TaylorP
- 1,039
- 2
- 13
- 28
-
Well, you have to benchmark the fps in each case. – kennytm Jul 12 '11 at 15:10
1 Answers
14
No.
The NEON instruction set (like every other vector ISA*) has a hardware approximate reciprocal square root instruction that is much faster than that oft-cited "trick". Use it instead if reciprocal square root is actually a performance bottleneck in your code (as always, benchmark first; don't spend time optimizing something if you have no hard evidence that its performance matters).
You can get at it by writing your own assembly (inline or otherwise) with the vrsqrte.f32
instruction, or from C, Objective-C, or C++ by including the <arm_neon.h>
header and using the vrsqrte_f32( )
intrinsic.
[*] On SSE it's rsqrtss
/rsqrtps
; on Altivec it's frsqrte
/vrsqrte
.

Stephen Canon
- 103,815
- 19
- 183
- 269
-
That instruction is "approximate". You may still want to use one or two iterations of newtons method after that. See the Wikipedia entry about half way down. Agreed that the quake hack is not to be used. – phkahler Jul 12 '11 at 17:25
-
2@phkahler: I called out the fact that it is approximate in my answer; the "quake hack" is also approximate, for that matter. One of the nice things about NEON is that there is hardware support for the Newton step as well, via the `vrsqrts.f32` instruction, which should be used instead of a generic Newton iteration. – Stephen Canon Jul 12 '11 at 17:49