How to determine if a given int64_t can be stored losslessly in a double?

Question

I'd like to determine if a given 64-bit integer can be stored losslessly in a double. Right now I have this code:

static_cast<int64_t>(static_cast<double>(value)) == value

However, I believe this is not always accurate due to excess precision on some platforms.

Note that I'm not asking for the largest integer such that all smaller integers can be stored losslessly, which is 2^53. I need to know if a given integer N is losslessly storeable even if N+1 and N-1 are not.

Is there something in the standard library, maybe similar to std::numeric_limits, that will tell me this?

What's actually wrong checking against `std::numeric_limits`? — πάντα ῥεῖ, Mar 01 '17 at 20:45
If the distance between the most significant non-zero bit and the least significant non-zero bit of the positive value is less or equal to `std::numeric_limits::digits` I'd think the `int64_t` can be stored losslessly in a `double` (I think it is less or equal but there is a chance that I got an off by one error here). — Dietmar Kühl, Mar 01 '17 at 20:49
doesn't Christoph's answer demonstrate via the standard wording that a cast will "remove all extra range and precision" — kmdreko, Mar 01 '17 at 21:00
Checking `std::numeric_limits` will give me the 2^53 boundary, but won't tell me if a given integer >2^53, like 2^54, will fit. It also assumes IEEE representation if I'm not mistaken. — ptomato, Mar 01 '17 at 21:46

Mark B · Accepted Answer · 2017-03-02T15:10:31.110

9

As long as low bits are 0, you can have more high order bits set (because you can increase the double's exponent). I reduced the requirement to unsigned to make the bit shifting not have to worry about sign bits but I believe it should be adaptable.

bool fits_in_double_losslessly (uint64_t v)
{
    const uint64_t largest_int = 1ULL << 53;

    while ((v > largest_int) && !(v & 1))
    {
        v >>= 1;
    }

    return v <= largest_int;
}

edited Mar 02 '17 at 15:10

answered Mar 01 '17 at 20:53

Mark B

95,107
10
109
188

Does this assume IEEE representation? – ptomato Mar 01 '17 at 21:49
This indeed looks like IEEE 754. If you have a Linux system, then you can examine /usr/include/x86_64-linux-gnu/ieee754.h to see how floating point values are internally represented. Within this file you can see that mantissa0 has 20 bits and mantissa1 has 32 bits (totally 52 bits). This implies that a 64-bit integer can populate any consecutive 52 bits loselessly. For completeness, you can loselessly store consecutive 23 bits into a float and (if supported) all 64 bits into a long double. – Andrew Mar 01 '17 at 22:11
2

You'd need to remove the sign-bit first... And I'm not aware of any HW available "with ease" that isn't IEEE-compatible (at least as far as the layout of the bits - there are some implementations that "cheat" on the specifics of rounding, required precision in some calculations, and so on) – Mats Petersson Mar 01 '17 at 22:39
The original code snippet would go into an infinite loop if the value being tested was 0; I edited it to reflect what I believe works. – ptomato Mar 02 '17 at 05:18

Davislor · Answer 2 · 2017-03-06T08:35:41.933

2

Subtract the highest bit set from the lowest bit set, add one to avoid fencepost error, compare to the number of bits in the mantissa. Remember that the leading 1 of the mantissa is implicit in IEEE. If the mantissa can hold the entire range of bits in use, it can store that number exactly.

Alternatively, see if static_cast<double>(m) != static_cast<double>(m+1) && static_cast<double>(m) != static_cast<double>(m-1). This works for an unsigned value, but for a signed type, portable code would need to check that the value will not overflow or underflow, as that is undefined behavior. Then again, some implementations might not have int64_t either. On many implementations, however, signed overflow or underflow will still give you a different number with the opposite sign, which will properly test as distinct.

edited Mar 06 '17 at 08:35

answered Mar 02 '17 at 06:18

Davislor

14,674
2
34
49

Your alternative suggestion is elegant. Important to note, though, that it requires a check to make sure `m` is not the minimum or maximum value of its integer type. – ptomato Mar 06 '17 at 01:56
@ptomato How to handle that? *Unsigned* arithmetic is guaranteed to wrap around, but the question asked about a signed type. Signed integer overflow and underflow is undefined behavior. (And implementations typically do whatever the hardware does.) That said, on most compilers for most targets, the behavior is specified as producing a number of opposite sign, and the test will still work. – Davislor Mar 06 '17 at 08:27

Bob Miller · Answer 3 · 2017-03-03T21:08:01.720

A common theme among the other answers is the concern about whether or not you can assume that the doubles are implemented via IEEE 754 or not.

I'd argue that for many purposes your original method is best: cast the given int64_t to a double, then assign the result back to a second int64_t, and compare the two.

Regarding your concern about excess precision, that shouldn't be applicable as long as you know that the value has actually been written to memory and read back, because at that stage there shouldn't be any way for the compiler to store the "excess precision" for that variable. Based on the answers at your original link, I believe the following example would be enough to force that behavior:

bool losslessRoundTrip(int64_t valueToTest)
{
    double newRepresentation;
    *((volatile double *)&newRepresentation) = static_cast<double>(valueToTest);
    int64_t roundTripValue = static_cast<int64_t>(newRepresentation);
    return roundTripValue == valueToTest;
}

I suspect that simply declaring newRepresentation as volatile would be enough, but I don't use the volatile keyword enough to be sure, so I've simply adapted the solution from your link.

The nice thing about your original method is that it should work for pretty much anything supporting the right casts and operators, as long as you only care about whether you can come back to the initial type like this. For example, here is a generic implementation:

template<typename T1, typename T2>
bool losslessRoundTrip(T1 valueToTest)
{
    T2 newRepresentation;
    *((volatile T2 *)&newRepresentation) = static_cast<T2>(valueToTest);
    T1 roundTripValue = static_cast<T1>(newRepresentation);
    return roundTripValue == valueToTest;
}

Please note that this might not be the best answer for your question, because it still seems a cheat to check whether you can store it as a double by...storing it as a double. Still, I don't know how else to avoid relying on mechanics of the internal representation.

It also doesn't quite check whether it was losslessly stored in the double, but rather whether the round-trip from int64_t to double, and back to an int64_t is lossless on that platform. I don't know whether it's possible for there to be casting errors that cancel out when casts are applied in both directions (I'll appeal to someone with more knowledge for that), but in my opinion, I usually consider it close enough to "lossless" if I can get the initial value back if I convert back to the same type.

Good thinking with the `volatile` keyword. This will hopefully do what I _thought_ my original code would do. If it works on all the platforms where I need it to work, I'll give you the checkmark. — ptomato, Mar 06 '17 at 01:58
Unfortunately it didn't work on all platforms. Apparently compilers will do this even via a conversion that preserves the excess precision. Really good suggestion though. — ptomato, Mar 30 '17 at 04:31
Hmm...I still think the basic premise should work, because if we know that a value is really written to disk and then read back then there is no way to store the excess precision. The question therefore becomes how to force that to be the case. May I ask for an example of a platform where it didn't work so I may experiment? — Bob Miller, Mar 30 '17 at 04:43
Sure. See https://bugzilla.gnome.org/show_bug.cgi?id=779399 for the actual use case. The architectures where it didn't work were armhf and ppc64el. — ptomato, Mar 30 '17 at 05:36
Ok awesome. I have an armhf setup I'll bring home from work tomorrow to see if I can reproduce the issue. I suspect my implementation of the idea is just slightly wrong somehow. This is the problem with volatile having so few valid use cases ;) — Bob Miller, Mar 30 '17 at 05:53
Just an update: I looked at it yesterday but haven't managed to get it to behave like the automated tests you linked (on my available compilers the code you have worked properly on armhf). I should be able to take another look some time this weekend to see if I can track down the root cause. — Bob Miller, Mar 31 '17 at 23:10
I've reproduced the failure from your link, but I have not yet been able to determine why. Looking at the generated assembly, the value is indeed being written to memory and read back. I'm surprised that any type of excess precision problem could occur in such a case. I'll take another look sometime this week, but if I can't figure it out by then I'll call it a loss and delete this answer. Oh well :/ — Bob Miller, Apr 03 '17 at 17:45
Yeah, I basically chalked it up to "compilers are tricksy". But please don't delete the answer! I think it's still valuable. — ptomato, Apr 04 '17 at 01:35

score -3 · Answer 4 · answered Mar 01 '17 at 20:46

-3

You need to test the lower bits. If they are zero, the number will be stored losslessly.

answered Mar 01 '17 at 20:46

Malcolm McLean

6,258
1
17
18

1

Some proof example for this? – πάντα ῥεῖ Mar 01 '17 at 20:47
An IEEE double stores numbers with a binary floating point. So a number which is a multiple of a power of 2, ie has low bits zero, is like 3,000,000 in decimal. It can be represented exactly despite the approximation. – Malcolm McLean Mar 01 '17 at 20:56
1

There's no guarantee the compiler or the target FPU uses that IEEE double encoding. – πάντα ῥεῖ Mar 01 '17 at 20:58
1

2^1000 is exactly representable as a double and has zero as bottom bits. not gonna fit into a uint64_t though. – Mike Vine Mar 01 '17 at 21:02
No, IEEE 754 is not requires. But it will use a binary floating point. It can represent multiples of powers of two exactly until it runs out of exponent, which means very large numbers indeed. – Malcolm McLean Mar 01 '17 at 22:35
@malcolm: it could use a base which is a power of 2; at least one historic CPU used base 16. That will mean that the number of bits of precision varies depending on the first hex digit. – rici Mar 02 '17 at 06:32

How to determine if a given int64_t can be stored losslessly in a double?

4 Answers4

Linked