Is there a fast C or C++ standard library function for double precision inverse square root?

Question

I find myself typing

double foo=1.0/sqrt(...);

a lot, and I've heard that modern processors have built-in inverse square root opcodes.

Is there a C or C++ standard library inverse square root function that

uses double precision floating point?
is as accurate as 1.0/sqrt(...)?
is just as fast or faster than the result of 1.0/sqrt(...)?

@Pherric Oxide: That was inverse square, not inverse square root. — Dan, Oct 16 '12 at 21:28
Is there a way you can rearrange the maths to do intermediate work in squares, then take a minimal number of square roots at the end? — paddy, Oct 16 '12 at 21:43
The built-in inverse square root instruction that you've heard of is an approximation, not as exact as `sqrt`. See http://www.tommesani.com/SSEReciprocal.html — Mark Ransom, Oct 16 '12 at 21:47
Not as fast as in [Quake III Arena](http://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code) perhaps. — Roman R., Oct 16 '12 at 21:48
@Mark Ransom: That's basically the answer I was looking for. — Dan, Oct 17 '12 at 01:04

score 19 · Accepted Answer · answered Oct 16 '12 at 21:26

19

No. No, there isn't. Not in C++. Nope.

answered Oct 16 '12 at 21:26

Lightness Races in Orbit

378,754
76
643
1,055

score 6 · Answer 2 · answered Dec 22 '18 at 04:58

6

You can use this function for faster inverse square root computing
There's an article on wikipedia on how it works: https://en.wikipedia.org/wiki/Fast_inverse_square_root
Also there's a C version of this algorithm.

float invSqrt( float number ){
    union {
        float f;
        uint32_t i;
    } conv;

    float x2;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    conv.f  = number;
    conv.i  = 0x5f3759df - ( conv.i >> 1 );
    conv.f  = conv.f * ( threehalfs - ( x2 * conv.f * conv.f ) );
    return conv.f;
}

answered Dec 22 '18 at 04:58

rafaraj

63
1
4

Reading about this is what gave me the idea for the question. An article about the fast inverse square root said that some hardware had inverse square root instructions because of how much it comes up in graphics code. I couldn't use this algorithm at the time because I needed full double precision, but have an upvote for anyone reading this answer who hasn't heard of it :). – Dan Dec 22 '18 at 05:12

score 3 · Answer 3 · answered Jun 07 '20 at 01:07

I don't know of a standardized C API for this, but that does not mean you cannot use the fast inverse sqrt instructions, as long as you are willing to write platform dependent intrinsics.

Let's take 64-bit x86 with AVX for example, where you can use _mm256_rsqrt_ps() to approximate the reciprocal of a square root. Or more specifically: 8 square-roots in a single go, using SIMD.

#include <immintrin.h>

...

float inputs[8] = { ... } __attribute__ ((aligned (32)));
__m256 input = _mm256_load_ps(inputs);
__m256 invroot = _mm256_rsqrt_ps(input);

Similarly, you can use the intrinsic vrsqrteq_f32 on ARM with NEON. In this case, the SIMD is 4-wide, so it will compute four inverse square roots in a single go.

#include <arm_neon.h>

...

float32x4_t sqrt_reciprocal = vrsqrteq_f32(x);

Even if you need just one root value per batch, it is still faster than a full square root. Just set the input in all, or one lane of the SIMD register. That way, you will not have to go through your memory with a load operation. On x86 that is done via _mm256_set1_ps(x).

Duke · Answer 4 · 2019-10-30T10:17:15.143

Violating constraints 1. and 2. (and it's also not standard), but it still might help someone browsing through...

I used ASMJIT to just-in-time compile the exact assembly operation you're looking for: RSQRTSS (single precision, ok, but it should be similar with double).

My code is this (cf. also my answer in a different post):

   typedef float(*JITFunc)();

   JITFunc func;
   asmjit::JitRuntime jit_runtime;
   asmjit::CodeHolder code;
   code.init(jit_runtime.getCodeInfo());

   asmjit::X86Compiler cc(&code);
   cc.addFunc(asmjit::FuncSignature0<float>());

   float value = 2.71; // Some example value.
   asmjit::X86Xmm x = cc.newXmm();
   uint32_t *i = reinterpret_cast<uint32_t*>(&value);
   cc.mov(asmjit::x86::eax, i[0]);
   cc.movd(x, asmjit::x86::eax);

   cc.rsqrtss(x, x);   // THE asm function.

   cc.ret(x);

   cc.endFunc();
   cc.finalize();

   jit_runtime.add(&func, &code);

   // Now, func() can be used as the result to rsqrt(value).

If you do the JIT compilation part only once, calling it later with different values, this should be faster (though slightly less accurate, but this is inherent to the built-in operations you're talking about) than 1.0/sqrt(...).

score -1 · Answer 5 · answered Oct 16 '12 at 21:42

-1

If your not afraid of using your own functions, try the following:

template <typename T>
T invsqrt(T x)
{
    return 1.0 / std::sqrt(x);
}

It should be just as fast as the orginal 1.0 / std::sqrt(x) in any modernly optimized compiler. Also, it can be used with doubles or floats.

answered Oct 16 '12 at 21:42

Ryan

2,378
1
19
29

violates rule#3 in the question! – Aniket Inge Oct 16 '12 at 21:53
3

Sorry, as I understand, it should be "just as fast". – Ryan Oct 16 '12 at 21:54
Read http://stackoverflow.com/questions/2442358/do-c-templates-make-programs-slow to see why or why not template functions should be slower than non-templated code. – Ryan Oct 16 '12 at 21:57
1

Also, if you enable `-ffast-math` in gcc, it will use an approximation to inverse square root. This would ensure it is just as fast/faster than regular square root. – Azmisov Sep 16 '16 at 22:52

score -3 · Answer 6 · answered Oct 16 '12 at 21:41

-3

If you find yourself writing the same thing over and over, you should think to yourself "function!":

double invsqrt(const double x)
{
    return 1.0 / std::sqrt(x);
}

Now the code is more self-documenting: people don't have to deduce 1.0 / std::sqrt(x) is the inverse square root, they read it. Additionally, you now get to plug in whatever implementation you want and each call-site automatically uses the updated definition.

To answer your question, no, there is no C(++) function for it, but now that you've made one if you find your performance is too lacking you can substitute your own definition.

answered Oct 16 '12 at 21:41

GManNickG

494,350
52
494
543

violates rule#3 in the question – Aniket Inge Oct 16 '12 at 21:52
why rely on the compiler when you can use the preprocessor? I still think I didn't deserve the -ve vote :-( – Aniket Inge Oct 16 '12 at 21:55
1

@PrototypeStark: Because it's not as simple as either-or. One is type-checked, debugabble, scopable, overloadable, evaluates its argument is an expression once, etc. (all the features of a function), the other is not. And it's a single downvote, it's not the end of the world; I understand it's frustrating not getting a reason from the person themselves, but that's how it is. – GManNickG Oct 16 '12 at 22:03
he just downvoted and left yeah its frustrating. I find it funny too though. – Aniket Inge Oct 16 '12 at 22:06
4

I would argue that it is easier to read `1.0/sqrt(x)` as the inverse square root than `invsqrt(x)`, the former using the less ambiguous mathematical notation as opposed to an abbreviation. – Dan Oct 17 '12 at 01:10
@Dan: That's fine, it's up to you what you find easy to read of course. But over time I think you'll find it's much better in principle to hide details, including what constitutes an inverse square root. – GManNickG Oct 17 '12 at 01:17
2

"inverse square root" just means the reciprocal of the square root, though. It's not a detail, it's literally what the function name means. – Dan Oct 17 '12 at 01:28
1

@Dan: You're conflating implementation with specification. You're right, inverse square root is the reciprocal of the square root, but how to do get from that to `1.0 / sqrt(x)`? It's not hard, of course, but that's not the point: it's still a division from specification to implementation. Hide the implementation, keep the specification; it makes reasoning and maintaining about your program easier. Consider how easy it would be to optimize *every single* inverse square root calculation in your entire program, just by changing the implementation and keeping the specification. – GManNickG Oct 17 '12 at 01:37
2

@GManNickG: While I'd usually be the first to agree with that logic, there are limits. You wouldn't write a function `multiplyByTwo` -- you'd write `*2`. Personally I'd say the inverse square root example is right on the borderline. – Lightness Races in Orbit Oct 17 '12 at 05:55

score -4 · Answer 7 · answered Oct 16 '12 at 21:31

-4

why not try this? #define INSQRT(x) (1.0/sqrt(x))

Its just as fast, requires less typing(makes you feel like its a function), uses double precision, as accurate as 1/sqrt(..)

answered Oct 16 '12 at 21:31

Aniket Inge

25,375
5
50
78

6

I didn't downvote, but here's no use for a macro here when a function will do. (You even said it yourself: make it *feel* like a function? Just actually make a function.) – GManNickG Oct 16 '12 at 21:43
@GManNickG The reason I didn't convert it to a function is, because the question clearly mentions: "is just as fast or faster than the result of 1.0/sqrt(...)". Making it a function will add additional overhead making the "statement" 1.0/sqrt(...) SLOWER. – Aniket Inge Oct 16 '12 at 21:48
12

Not on any compiler from the last decade. – GManNickG Oct 16 '12 at 21:54
3

@PrototypeStark: Please provide benchmarks to back up your claim that using a real function will be slower. Macros may safely be avoided in the absence of evidence that they are required to meet some criterion. That said, I always carry my `#define isNaN(x) ((x)!=(x))` around with me; sometimes it just feels good to be so bad. – Lightness Races in Orbit Oct 17 '12 at 05:56

Is there a fast C or C++ standard library function for double precision inverse square root?

7 Answers7

Linked