0

I am trying to understand the below code snippet taken from here

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // ??? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

What I dont understand is the conversion from float to long pointer and back to float pointer. Why cant we simply do i=y instead of first referencing and then dereferencering the float.

I am new to pointer conversions, so please bear with me.

infoclogged
  • 3,641
  • 5
  • 32
  • 53
  • 3
    This code invokes *undefined behavior*. Also `i = y` would perform a *conversion* from `float` to `long` (including truncation), what this code does is *reinterpret* the bits of a `float` as a `long` (hence the comments about "bit level hacking") – UnholySheep Jul 19 '18 at 09:12
  • @UnholySheep - cant understand the difference between conversion from float to long and 'reinterpret the bits of float as a long'. Can you put some example to show the difference? – infoclogged Jul 19 '18 at 09:16
  • Here's a quick example showing the difference (using `unsigned int` instead of `long` and `memcpy` instead of pointer casts to avoid *undefined behavior*): https://ideone.com/LndSR2 – UnholySheep Jul 19 '18 at 09:20
  • related: https://stackoverflow.com/q/43120045/5470596 – YSC Jul 19 '18 at 09:29
  • @Asu - i am not questioning the magic number. I am trying to understand the conversion. Its a completely different question. – infoclogged Jul 19 '18 at 09:59
  • 1
    @infoclogged: The linked question has a step-by-step explanation including that step. – MSalters Jul 19 '18 at 10:35
  • @MSalters Would be glad if you can refer one single answer in that post which explains the conversion step by step !? Its not prudent to duplicate questions just because the related question has a "hint" of an answer. Alone look at the subject line which differs so much... – infoclogged Jul 19 '18 at 10:51
  • please reopen the question, coz the focus of my question is different than one in the duplicate. – infoclogged Jul 19 '18 at 10:56
  • @infoclogged: See Rushyo's answer. The cast "gets the bits of the floating-point number". An assignment takes the value, not the bits. – MSalters Jul 19 '18 at 11:50
  • @MSalters - so the dereferencing step i = * ( long * ) &y; is literally copying the value held at address location y into the address location of i ? This means, the type cast should not matter.. It could be long * or int * .. anyone of them would work as long as the sizeof the typecast is more than float.. Is my understanding correct? – infoclogged Jul 19 '18 at 13:12
  • @infoclogged: The typecast has to be to the type of `i`, of course. But you could indeed change `i` to `unsigned long`, match the cast, and it would work the same - `sqrt(f)` doesn't work on negative `f` anyway. – MSalters Jul 19 '18 at 16:18

2 Answers2

3

This code snipped is obviously the fast inverse square root. The pointer semantics there are not really used to do pointer things, but to reinterpret the bits at a certain memory location as a different type.

If you were to assign i=y this would be turned into a truncating conversion from floating point to integer. This however is not what's desired here. What you actually want is raw access to the bits, which is not straightforward possible on a floating point typed variable.

Let's break this statement down:

i  = * ( long * ) &y;
  • &y: address of y. The type of this expression is (float*).

  • (long*): cast to type. Appled to &y it steamrolls over the information, that this is the address of a floating point typed object.

  • *: dereference, which means, "read out" whatever is located at the address given and interpret as the base type of the pointer that's being dereferenced. We've overwritten that to be (long*) and essentially are lying to the compiler.

For all intents and purposes this breaks pointer aliasing rules and invokes undefined behaviour. You should not do this (caveats apply¹).

The somewhat well defined way (at least it doesn't break pointer aliasing rules) to do such trickery is by means of a union.

float Q_rsqrt( float number )
{
    union {
        float y;
        long  i;
    } fl;
    float x2;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    fl.y  = number;
    fl.i  = 0x5f3759df - ( fl.i >> 1 );                   // ??? 
    fl.y  = fl.y * ( threehalfs - ( x2 * fl.y * fl.y ) ); // 1st iteration
//  fl.y  = fl.y * ( threehalfs - ( x2 * fl.y * fl.y ) ); // 2nd iteration, this can be removed

    return fl.y;
}

EDIT:

It should be noted, that the type-punning via union as illustrated above is not sanctioned by the C language standard as well. However unlike language undefined behavior the standard so far leaves the details of the kind of union accesses done in that way as implementation dependent behavior. Since type-punning is something required for certain tasks, I think a few proposals had been made to make this well defined in some upcoming standard of the C programming language.

For all intents and purposes practically all compilers support the above scheme, whereas type-punning via pointer casts will lead to weird things happening if all optimization paths are enabled.


1: Some compilers (old, or custom written, for specific language extensions – I'm looking at you CUDA nvcc) are severly broken and you actually have to coerce them with this into doing what you want.

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • 2
    In C++ the union is actually UB I believe; you are accessing an inactive union member. A while ago I found out using `std::memcpy` in this case generated the same assembly, but without risking an UB. – asu Jul 19 '18 at 09:30
  • 2
    This is UB for reading the inactive union member (compiler extension(s) not withstanding). – Richard Critten Jul 19 '18 at 09:30
  • @RichardCritten: Hmm, only recently it was thrown around, that this is the propper way to perform type-punning in GCC. – datenwolf Jul 19 '18 at 09:39
  • @Asu: The intention here is not to copy things at all. Also as far as language lawyering goes memcpy is technically limited to copy between objects of identical type. – datenwolf Jul 19 '18 at 09:40
  • 1
    I'm pretty sure `memcpy` isn't affected by strict aliasing - and I'm pretty sure the compiler can optimize that. You could verify that on compiler explorer anyway. – asu Jul 19 '18 at 09:46
  • Just for reference, straight from the horse's mouth (which is not the language specification, but compiler documentation) : https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/Optimize-Options.html#index-fstrict-aliasing *,,The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.''* – datenwolf Jul 19 '18 at 09:48
  • 1
    @datenwolf as I mentioned in my comment, it's a compiler extension – Richard Critten Jul 19 '18 at 09:52
  • Got it ! So truncating is essentially removing the decimal part thereby corrupting the original bit sequence whereas typecasting maintains the original raw bit sequence, yet assign to another type. Ofcourse, this is only done for an intermediate step and hence the bits are reassigned to float type later. Is my understanding correct? – infoclogged Jul 19 '18 at 09:54
  • 1
    @infoclogged: Sort, yes. But converting a floating point bit sequence into its truncated (or rounded) integer equivalent actually takes a little bit more effort, than just **masking** a few bits (it also requires some shifting): The mantissa of a float maps the integer range `[0, 2^n]` (where n is the length of the mantissa) into the to the fractional range `[1.0, 1.999…)`. I leave it as an exercise to the reader to work out the calculations to map between the two. – datenwolf Jul 19 '18 at 17:02
1

OK, so you are looking at some ancient hackery from the time when floating point processors were either slow or non-existent. I doubt the original author would defend continuing to use it. It also doesn't meet the modern language transparency requirements (i.e. it is "Undefined behaviour") so may not be portable to all compilers or interpreters, or handled correctly by quality tools such as lint and valgrind, etc, but it was the way fast code was writ in the 80s and 90s.

At the bit level, everything is stored as bytes. A long is stored in 4 bytes, and a float is also stored in 4 bytes. However the bits are treated very differently. In integer/long, each bit is ranked similarly as a power of 2, and can be used as a bit field. In float, some bits are used to represent an exponent that is applied to the rest of the number. For more info read up on IEEE.

This trick takes the float value, and looks at the bytes as if it is an integer bit field, so then it can apply magic. The it looks at the resultant bytes as if they are a float again.

I have no idea what that magic is exactly. No-one else does, probably not even the guy who wrote it, as it isn't commented. On the other hand the doom and quake source did used to be cult code reading, so perhaps someone remembers the details?

There used to be many such tricks in the "good old days", but they are relatively unnecessary now, as floating point is now built in to the main processor and is as fast, and sometimes faster than, the integer operations. Originally, even uploading and downloading small ints from the co-processor could be done more quickly using such hacks than using the built-in methods.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27
  • Compilers won't mind optimizing this as long as it isn't UB... which it is, in C++. However there are ways to rewrite this a safe way. Valgrind will not mind either, there's no way it would trigger on that code. However I agree that anyway the in-hardware isqrt instruction might actually just be faster than this hack.. – asu Jul 19 '18 at 09:34
  • This answer is very far from truth. It may be the case, that this function still has its place (maybe I'll benchmark it, if I have the time). And of course, a lot of people understand what it does. It is tricky, but definitely not "magic". – geza Jul 19 '18 at 09:42
  • @geza: 0x5f3759df is "magic". I agree that the rest is probably the start of a power series. – Gem Taylor Jul 19 '18 at 09:46
  • You have written that a long is stored in 4 bytes, but my 64bit architecture computer prints 8 ! – infoclogged Jul 19 '18 at 09:49
  • @GemTaylor: I've benchmarked it, and it is 2.1x faster than the straightforward `1/sqrt` way on my machine. Okay, we don't have the definition of "magic" :). But anyways, it is a well known trick, definitely not "no-one knows how it works". – geza Jul 19 '18 at 09:53
  • @infoclogged Indeed, that is another issue. `Float` is certainly 4 byte and "it has ever been so". `Long` has a varied history. I made an assumption there that is old tech, but perhaps you are right. In which case, what is the expected upper dword value holding? – Gem Taylor Jul 19 '18 at 09:55
  • Also the trick will depend on endinaness, anyway. – Gem Taylor Jul 19 '18 at 09:56
  • @GemTaylor: See the linked duplicate. There are many people who fully understand the "magic". Those people also recognize the following instructions as a Newton's approximation, not a power series. As for endianness, the chief dependency is merely that `long` and `float` have the _same_ endianness. – MSalters Jul 19 '18 at 10:34