How can I justify that f64::from_bits(0x3fe9000000000000 u64 ) == 0.781250 f64

Question

I have slightly modified the original message with a second question :

I have been advised by a C++ expert to ckeck that out : https://en.cppreference.com/w/cpp/numeric/bit_cast

to understand better the representation of double, memcpy, and bit_cast (C++20) .

Here more specifically, I try to understand why we have that result from the code :

    constexpr std::uint64_t u64v2 = 0x3fe9000000000000ull;
    constexpr auto f64v2 = std::bit_cast<double>(u64v2);

"f64::from_bits(0x3fe9000000000000u64) == 0.781250f64"

Prior to it, I spent time to study the example which is provided in the example of fast inverse square root.

https://en.wikipedia.org/wiki/Fast_inverse_square_root#CITEREFGoldberg1991

I did the calculus manually, and it turns out that I finally realised what happens in this specific case with an exponent of 8 bits and a mantissa of 23 bits.

But in the example I mentioned above as an application of bit_cast, it seems according to my research that the exponent is 11 bits, the mantissa 52 bits (with double precision) : https://en.wikipedia.org/wiki/Double-precision_floating-point_format

When I did the calculs by hand, I found

x = (1+Mx/L)*2^(Ex-B)

with

 L=2^52 and Ex = 2*(2^9- 1) with the notations of

https://en.wikipedia.org/wiki/Fast_inverse_square_root#CITEREFGoldberg1991

And I don't find the result of `0.781250 as it is announced. Maybe the exponent and mantissa I choosed was not correct. I don't know, but I really would like to understand what happens.

Thanks in advance for your explanations to help to be to find 0.781250

2nd question : please can you check the question i have asked below as reply to the comment because even I have a challenge with the first example. thanks in advance

paxdiablo · Accepted Answer · 2019-10-19T23:27:58.193

With 3fe9000000000000, the first bit (zero) is the sign bit, so we can ignore that (it's positive).

The next 11 bits are 011.1111.1110 (3fe), which is 1022 but that is adjusted down by 1023 to handle negative exponents. It is therefore -1, which gives you the multiplier of 2^-1, or 0.5.

The mantissa bits are 1001000..0 (the 9000..0 of your hex number). Those first four bits equate to the values 0.5, 0.25, 0.0125 and 0.0625 (halving each time). Since only the first and fourth bit are set, you get 0.5 + 0.0625 = 0.5625.

Adding the implicit 1 to that number, as mandated by IEEE754, you get a base value of 1.5625. When that's multiplied by the multiplier calculated earlier, you get:

1.5625 x 0.5 = 0.78125

So that's how you get your value.

Further detail can be found on the IEEE754-1985 Wikipedia page, and you can experiment with Harald Schmidt's excellent online converter, a tool so damn useful I built my own copy of it to handle double precision as well (not on the web unfortunately, it was a Java app for the desktop). It really did help me a lot in understanding.

You might also want to look at some other answers I've given on IEEE754, including, this one in particular.

Regarding the bit pattern you brought up in a comment, 0x4172f58bc0000000 (which you state should be 19880124 but calculate something else), this is how you convert it:

4---> 1--> 7-->   2--> f--> 5--> 8--> b--> c--> (<- hex digits)
s eee eeee eeee   mmmm mmmm mmmm mmmm mmmm mmmm
0 100 0001 0111   0010 1111 0101 1000 1011 1100 (<- then all zeroes)
  v      v  vvv     |  ||||  | | |    | || ||            1/n
  1      1  421     |  ||||  | | |    | || |+-------- 4,194,304
  0      6          |  ||||  | | |    | || +--------- 2,097,152
  2                 |  ||||  | | |    | |+----------- 1,048,576
  4                 |  ||||  | | |    | +------------   524,288
                    |  ||||  | | |    +--------------   131,072
                    |  ||||  | | +-------------------     8,192
                    |  ||||  | +---------------------     4,096
                    |  ||||  +-----------------------     1,024
                    |  |||+--------------------------       256
                    |  ||+---------------------------       128
                    |  |+----------------------------        64
                    |  +-----------------------------        32
                    +--------------------------------         8

The sign is positive.

The exponent is 1,024 + 16 + 4 + 2 + 1 = 1,047 - 1,023 bias = 24, so the multiplier is 2²⁴ or 16,777,216.

The mantissa bits sum, with each bit adding ¹/_2ⁿ as n starts at 1 and increases to the right:

¹/_4,194,304, ¹/_2,078,152, ¹/_1,048,576, ¹/_524,288, ¹/_131,072, ¹/_8,192, ¹/_4,096, ¹/₂₅₆, ¹/₁₂₈, ¹/₆₄, ¹/₃₂, and ¹/₈.

When you add all these up along with the implicit 1, you get 1.1849477291107177734375.

Then, the product of that and the previously calculated multiplier of 16,777,216 is the value you want, 19,880,124‬.

Thank you so much for your detailed explanation. I understood better now. I had an error in calculating the L , the bias exponent. I took `2^11 - 1`, whereas , I shoud take `2^10 - 1`. — Dev, Oct 19 '19 at 13:04
Good evening, I continued to read the page and to verify. It seems there is a mistake ? because this is written (for the first example) : 19880124.000000f64.to_bits() == 0x4172f58bc0000000u64 I verified manually , and it seems that it is false. This would be : 12f58bc0000000. Can you confirm what I say or correct me and explain (if i was wrong)? thanks — Dev, Oct 19 '19 at 19:46
@Dev, I added a bit to the end showing how to do that particular bit pattern. Hope that helps. — paxdiablo, Oct 19 '19 at 23:17
My question is inverse : how from `19,880,124` can we obtain 0x4172f58bc0000000u64 ? Thank you .Yes, the other sense is easy for me, as the first example I have sent — Dev, Oct 19 '19 at 23:27
@Dev, there are quite a few good vids on YouTube if you search for `ieee754 how to turn decimal number into bit pattern`. It's basically converting to a binary fraction then normalising to binary scientific notation. — paxdiablo, Oct 19 '19 at 23:50
I tried manually the opposite sense, and I think it is much longer, I could find the exponent manually : 417 — Dev, Oct 20 '19 at 09:16
It remains the mantissa : 0.18494772911 to calculate. It would be a bit long. I think I have understood the principle. This video was useful : https://www.youtube.com/watch?v=qBHUGy1xteg — Dev, Oct 20 '19 at 09:17

score 1 · Answer 2 · answered Oct 20 '19 at 00:06

Here's how your number lays out in IEEE-745 DP format:

                  6    5          4         3         2         1         0
                  3 21098765432 1098765432109876543210987654321098765432109876543210
                  S ----E11---- ------------------------F52-------------------------
          Binary: 0 01111111110 1001000000000000000000000000000000000000000000000000
             Hex: 3FE9 0000 0000 0000
       Precision: DP
            Sign: Positive
        Exponent: -1 (Stored: 1022, Bias: 1023)
       Hex-float: +0x1.9p-1
           Value: +0.78125 (NORMAL)

How can I justify that f64::from_bits(0x3fe9000000000000 u64 ) == 0.781250 f64

2 Answers2