Unexpected result after converting uint64_t to double

Question

In the following code:

#include <iostream>

...

uint64_t t1 = 1510763846;
uint64_t t2 = 1510763847;
double d1 = (double)t1;
double d2 = (double)t2;
// d1 == t2 => evaluates to true somehow?
// t1 == d2 => evaluates to true somehow?
// d1 == d2 => evaluates to true somehow?
// t1 == t2 => evaluates to false, of course.
std::cout << std::fixed << 
        "uint64_t: " << t1 << ", " << t2 << ", " <<
        "double: " << d1 << ", " << d2 << ", " << (d2+1) << std::endl;

I get this output:

uint64_t: 1510763846, 1510763847, double: 1510763904.000000, 1510763904.000000, 1510763905.000000

And I don't understand why. This answer: biggest integer that can be stored in a double says that an integral number up to 2^53 (9007199254740992) can be stored in a double without losing precision.

I actually get errors when I start doing calculations with the doubles, so it's not only a printing issue. (e.g. 1510763846 and 1510763847 both give 1510763904)

It's also very weird that the double can just be added to and then come out correct (d2+1 == 1510763905.000000)

Rationale: I'm converting these numbers to doubles because I need to work with them in Lua, which only supports floating point numbers. I'm sure I'm compiling the Lua lib with double as the lua_Number type, not float.

std::cout << sizeof(t1) << ", " << sizeof(d2) << std::endl;

Outputs

8, 8

I'm using VS 2012 with target MachineX86, toolkit v110_xp. Floating point model "Precise (/fp:precise)"

Addendum

With the help of people who replied and this article Why are doubles added incorrectly in a specific Visual Studio 2008 project?, I've been able to pinpoint the problem. A library is using a function like _set_controlfp, _control87, _controlfp or __control87_2 to change the precision of my executable to "single". That is why a uint64_t conversion to a double behaves as if it's a float.

When doing a file search for the above function names and "MCW_PC", which is used for Precision Control, I found the following libraries that might have set it:

Android NDK
boost::math
boost::numeric
DirectX (We're using June 2010)
FMod (non-EX)
Pyro particle engine

Now I'd like to rephrase my question:

How do I make sure converting from a uint64_t to a double goes correctly every time, without:

having to call _fpreset() each and every time a possible conversion occurs (think about the function parameters)
having to worry about a library's thread changing the floating point precision in between my _fpreset() and the conversion?

Naive code would be something like this:

double toDouble(uint64_t i)
{
    double d;
    do {
        _fpreset();
        d = i;
        _fpreset();
    } while (d != i);
    return d;
}

double toDouble(int64_t i)
{
    double d;
    do {
        _fpreset();
        d = i;
        _fpreset();
    } while (d != i);
    return d;
}

This solution assumes the odds of a thread messing with the Floating Point Precision twice are astronomically small. Problem is, the values I'm working with, are timers that represent real-world value. So I shouldn't be taking any chances. Is there a silver bullet for this problem?

you have d2 twice in yr cout. Why does t1 == t2 when you say 'HIT!' thats for sure wrong — pm100, Nov 15 '17 at 16:51
This works correctly on ideone after fixing a typo in your code ([demo](https://ideone.com/Ljdjsp)). — Sergey Kalinichenko, Nov 15 '17 at 16:52
there is something more suspicious going on here. This is not the full code — pm100, Nov 15 '17 at 16:52
Works properly with VS 2017 and gcc 7.2 after correcting typo. — , Nov 15 '17 at 16:53
What platform are you compiling for? What compiler are you using? What compiler flags? — Daniel H, Nov 15 '17 at 16:53
What platform and compiler are you using. Cos I get both being as you would expect on VS2013 (Release build) on an x86 running in 64-bit mode. — Goz, Nov 15 '17 at 16:54
I meant to check sizeof float == sizeof double. Double has to be at least the size of float, but not necessarily larger. — Aki Suihkonen, Nov 15 '17 at 16:55
What does "HIT! (naturally)" mean? Does it mean that the numbers are equal? Does it mean that the numbers are different? Anything else? — anatolyg, Nov 15 '17 at 16:57
@pm100 Saying “HIT” there means that the `assert` is actually triggered; i.e., he didn’t accidentally change `NDEBUG` or something.\ — Daniel H, Nov 15 '17 at 16:58
Thank you for confirming that computers are awesome and that the error must be with my compiler settings. (Which means it's fixable, yay!) Thanks for the cool online tool link, @dasblinkenlight ! — Luc Bloom, Nov 15 '17 at 17:04
I'm using VS 2012 with target MachineX86, toolkit v110_xp. Floating point model "Precise (/fp:precise)". Any more info? — Luc Bloom, Nov 15 '17 at 17:05
You've added `#include` directives for `` (which you need) and `` (which you don't use). You also need `#include` headers for `` and ``. I suggest you update your question to show a complete self-contained program that we can copy-and-paste and run unmodified on our own systems. [mcve] — Keith Thompson, Nov 15 '17 at 17:13
@KeithThompson I’m guessing that `` itself includes `` and `` on the OP’s machine, so it’s hard to notice the missing headers. — Daniel H, Nov 15 '17 at 18:29
Keith’s right. I should make it self-contained and test it that way, too. Tomorrow I’ll try and see if_fpreset() restores my sanity. — Luc Bloom, Nov 15 '17 at 22:39

Aki Suihkonen · Answer 1 · 2017-11-15T17:00:08.930

0

From ieee754 floating point conversion it looks like your implementation of double is actually float, which is of course allowed by the standard, that mandates that sizeof double >= sizeof float.

The most accurate representation of 1510763846 is 1.510763904E9.

edited Nov 15 '17 at 17:00

answered Nov 15 '17 at 16:54

Aki Suihkonen

19,144
1
36
57

I thought you had the answer there, but it appears that the sizeof's are both 8. Could it be that according to ieee754, even though a double is 64-bits, it cannot hold the integral number 1510763846? – Luc Bloom Nov 15 '17 at 16:59
1

This is probably still the answer; only the conversion to `float` is somewhere else (cannot know where exactly). I remember that on some ARM platform there were dedicated functions in the C runtime for casting from 64-bit integers to floating point. These functions could be broken. – anatolyg Nov 15 '17 at 17:01
2

The C++ standard does allow `double` and `float` to have the same size, representation, range, and precision (though they're always distinct types), but it also imposes minimal requirements on the range and precision of type `double`. I'm fairly sure a 32-bit type can't meet those requirements. – Keith Thompson Nov 15 '17 at 17:10
1

This is the best reference to the allowed sizes of float and double I found: https://stackoverflow.com/a/24157568/1716339 — but I dont know whats happening, except that there’s strong smell of float. We should look at the assembler dump. – Aki Suihkonen Nov 15 '17 at 17:42

Unexpected result after converting uint64_t to double

1 Answers1