Cast from unsigned long long to double and vice versa changes the value

Question

When writing a C++ code I suddenly realised that my numbers are incorrectly casted from double to unsigned long long.

To be specific, I use the following code:

#define _CRT_SECURE_NO_WARNINGS

#include <iostream>
#include <limits>
using namespace std;

int main()
{
  unsigned long long ull = numeric_limits<unsigned long long>::max();
  double d = static_cast<double>(ull);
  unsigned long long ull2 = static_cast<unsigned long long>(d);
  cout << ull << endl << d << endl << ull2 << endl;
  return 0;
}

Ideone live example.

When this code is executed on my computer, I have the following output:

18446744073709551615
1.84467e+019
9223372036854775808
Press any key to continue . . .

I expected the first and third numbers to be exactly the same (just like on Ideone) because I was sure that long double took 10 bytes, and stored the mantissa in 8 of them. I would understand if the third number were truncated compared to first one - just for the case I'm wrong with the floating-point numbers format. But here the values are twice different!

So, the main question is: why? And how can I predict such situations?

Some details: I use Visual Studio 2013 on Windows 7, compile for x86, and sizeof(long double) == 8 for my system.

The problem does not occur in your ideone live example, so perhaps this is a bug in MSVC — M.M, Nov 20 '15 at 11:02
So you are saying you get a different result on your home computer to the IDE one version? — Tim B, Nov 20 '15 at 11:03
In MSVC do you still get the problem with `numeric_limits::max() - 100` ? — M.M, Nov 20 '15 at 11:04
It probably has something to do with the sign bit. Multiply your last result by 2. — PaulMcKenzie, Nov 20 '15 at 11:04
@M.M, for `numeric_limits::max() - 100` it still reports `9223372036854775808` (this is `1 << 63`, as far as I understand) — alexeykuzmin0, Nov 20 '15 at 11:07
this might be conforming behaviour because the int is not exactly representible in the double — M.M, Nov 20 '15 at 11:07
@M.M, the value of double is `1.84467e+019` - it obviously shouldn't be converted to `unsigned long long` as `1<<63` — alexeykuzmin0, Nov 20 '15 at 11:08
@PaulMcKenzie, not sure I understood your comment. Yes, the result is `1 << 63` while expected is `1 << 64 - 1`, but this result is reported for many numbers close to `1 << 64` — alexeykuzmin0, Nov 20 '15 at 11:09
I'm suspicious about the signed bit here. Is there any chance it got converted to a signed integer then from that to unsigned? A /2 error suggests 1 bit missing. — Tim B, Nov 20 '15 at 11:11
@M.M, because there's a closer value in `unsigned long long` type — alexeykuzmin0, Nov 20 '15 at 11:15
@TimB, looks like an interesting idea. Is this stated anywhere in standard? — alexeykuzmin0, Nov 20 '15 at 11:15
@alexeykuzmin0 Not that I'm aware of, why I posted it as a comment not an answer :) — Tim B, Nov 20 '15 at 11:16
@alexeykuzmin0: could you try with 0xf000000000000000 to see if you MSVC 2013 is concerned with what I suspect to be a bug? — Serge Ballesta, Nov 20 '15 at 16:19
@SergeBallesta, my MSVC 2013 outputs `0xf000000000000000` correctly — alexeykuzmin0, Nov 20 '15 at 16:20

score 14 · Accepted Answer · edited May 23 '17 at 12:02

18446744073709551615 is not exactly representible in double (in IEEE754). This is not unexpected, as a 64-bit floating point obviously cannot represent all integers that are representible in 64 bits.

According to the C++ Standard, it is implementation-defined whether the next-highest or next-lowest double value is used. Apparently on your system, it selects the next highest value, which seems to be 1.8446744073709552e19. You could confirm this by outputting the double with more digits of precision.

Note that this is larger than the original number.

When you convert this double to integer, the behaviour is covered by [conv.fpint]/1:

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

So this code potentially causes undefined behaviour. When undefined behaviour has occurred, anything can happen, including (but not limited to) bogus output.

The question was originally posted with long double, rather than double. On my gcc, the long double case behaves correctly, but on OP's MSVC it gave the same error. This could be explained by gcc using 80-bit long double, but MSVC using 64-bit long double.

Bathsheba · Answer 2 · 2015-11-20T11:19:44.103

1

The problem is surprisingly simple. This is what is happening in your case:

18446744073709551615 when converted to a double is round up to the nearest number that the floating point can represent. (The closest representable number is larger).

When that's converted back to an unsigned long long, it's larger than max(). Formally, the behaviour of converting this back to an unsigned long long is undefined but what appears to be happening in your case is a wrap around.

The observed significantly smaller number is the result of this.

edited Nov 20 '15 at 11:19

answered Nov 20 '15 at 11:09

Bathsheba

231,907
34
361
483

This doesn't explain why the rounding differs on the IDEOne example? Or why it happens on max()-100 – Tim B Nov 20 '15 at 11:10
1

If it wrapped around mod 2^64 you'd get a number near 0. – interjay Nov 20 '15 at 11:10
@TimB, as far as I understand, in IdeOne `sizeof(long double)` is `12` while on my computer it's `8`. – alexeykuzmin0 Nov 20 '15 at 11:11
@TimB, (i) IDEOne example was originally using a long double, (ii) floating point details could differ are two possibilities. a 12 byte double with appropriate mantissa / exponent split *could* represent `max()` exactly. – Bathsheba Nov 20 '15 at 11:11
@Bathsheba, I think you're wrong because `1ULL << 64 == 0`, not `1ULL << 64 == 1 ULL << 63` – alexeykuzmin0 Nov 20 '15 at 11:12
1

1ULL << 64 = one cat eaten due to UB. Start with the obvious. The compiler is **not** broken. – Bathsheba Nov 20 '15 at 11:13
@alexeykuzmin0 I think this answer is correct, except for the part about wrapping around. The conversion to integer is undefined behavior if it won't fit. – interjay Nov 20 '15 at 11:15
@Bathsheba Then explain how wrapping 2^64 modulo 2^64 gives you 2^63. The standard does say it's undefined. – interjay Nov 20 '15 at 11:17

score 1 · Answer 3 · answered Nov 20 '15 at 11:13

1

It's due to double approximation to long long. Its precision means ~100 units error at 10^19; as you try to convert values around the upper limit of long long range, it overflows. Try to convert 10000 lower value instead :)

BTW, at Cygwin, the third printed value is zero

answered Nov 20 '15 at 11:13

AndreyS Scherbakov

2,674
2
20
27

Yes, here should be the overflow, but I didn't think it's UB - I was sure that on all reasonable platforms the upper bits are just truncated. You're right, when I use `numeric_limits::max() - 10000`, it works. – alexeykuzmin0 Nov 20 '15 at 11:19
One should never rely on it! Note, for example, that it may be a hardware built-in operation. – AndreyS Scherbakov Nov 20 '15 at 11:25

Cast from unsigned long long to double and vice versa changes the value

3 Answers3

Linked

Related