16

When writing a C++ code I suddenly realised that my numbers are incorrectly casted from double to unsigned long long.

To be specific, I use the following code:

#define _CRT_SECURE_NO_WARNINGS

#include <iostream>
#include <limits>
using namespace std;

int main()
{
  unsigned long long ull = numeric_limits<unsigned long long>::max();
  double d = static_cast<double>(ull);
  unsigned long long ull2 = static_cast<unsigned long long>(d);
  cout << ull << endl << d << endl << ull2 << endl;
  return 0;
}

Ideone live example.

When this code is executed on my computer, I have the following output:

18446744073709551615
1.84467e+019
9223372036854775808
Press any key to continue . . .

I expected the first and third numbers to be exactly the same (just like on Ideone) because I was sure that long double took 10 bytes, and stored the mantissa in 8 of them. I would understand if the third number were truncated compared to first one - just for the case I'm wrong with the floating-point numbers format. But here the values are twice different!

So, the main question is: why? And how can I predict such situations?

Some details: I use Visual Studio 2013 on Windows 7, compile for x86, and sizeof(long double) == 8 for my system.

edmz
  • 8,220
  • 2
  • 26
  • 45
alexeykuzmin0
  • 6,344
  • 2
  • 28
  • 51

3 Answers3

14

18446744073709551615 is not exactly representible in double (in IEEE754). This is not unexpected, as a 64-bit floating point obviously cannot represent all integers that are representible in 64 bits.

According to the C++ Standard, it is implementation-defined whether the next-highest or next-lowest double value is used. Apparently on your system, it selects the next highest value, which seems to be 1.8446744073709552e19. You could confirm this by outputting the double with more digits of precision.

Note that this is larger than the original number.

When you convert this double to integer, the behaviour is covered by [conv.fpint]/1:

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

So this code potentially causes undefined behaviour. When undefined behaviour has occurred, anything can happen, including (but not limited to) bogus output.


The question was originally posted with long double, rather than double. On my gcc, the long double case behaves correctly, but on OP's MSVC it gave the same error. This could be explained by gcc using 80-bit long double, but MSVC using 64-bit long double.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
1

The problem is surprisingly simple. This is what is happening in your case:

18446744073709551615 when converted to a double is round up to the nearest number that the floating point can represent. (The closest representable number is larger).

When that's converted back to an unsigned long long, it's larger than max(). Formally, the behaviour of converting this back to an unsigned long long is undefined but what appears to be happening in your case is a wrap around.

The observed significantly smaller number is the result of this.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • This doesn't explain why the rounding differs on the IDEOne example? Or why it happens on max()-100 – Tim B Nov 20 '15 at 11:10
  • 1
    If it wrapped around mod 2^64 you'd get a number near 0. – interjay Nov 20 '15 at 11:10
  • @TimB, as far as I understand, in IdeOne `sizeof(long double)` is `12` while on my computer it's `8`. – alexeykuzmin0 Nov 20 '15 at 11:11
  • @TimB, (i) IDEOne example was originally using a long double, (ii) floating point details could differ are two possibilities. a 12 byte double with appropriate mantissa / exponent split *could* represent `max()` exactly. – Bathsheba Nov 20 '15 at 11:11
  • @Bathsheba, I think you're wrong because `1ULL << 64 == 0`, not `1ULL << 64 == 1 ULL << 63` – alexeykuzmin0 Nov 20 '15 at 11:12
  • 1
    1ULL << 64 = one cat eaten due to UB. Start with the obvious. The compiler is **not** broken. – Bathsheba Nov 20 '15 at 11:13
  • @alexeykuzmin0 I think this answer is correct, except for the part about wrapping around. The conversion to integer is undefined behavior if it won't fit. – interjay Nov 20 '15 at 11:15
  • @Bathsheba Then explain how wrapping 2^64 modulo 2^64 gives you 2^63. The standard does say it's undefined. – interjay Nov 20 '15 at 11:17
1

It's due to double approximation to long long. Its precision means ~100 units error at 10^19; as you try to convert values around the upper limit of long long range, it overflows. Try to convert 10000 lower value instead :)

BTW, at Cygwin, the third printed value is zero

AndreyS Scherbakov
  • 2,674
  • 2
  • 20
  • 27
  • Yes, here should be the overflow, but I didn't think it's UB - I was sure that on all reasonable platforms the upper bits are just truncated. You're right, when I use `numeric_limits::max() - 10000`, it works. – alexeykuzmin0 Nov 20 '15 at 11:19
  • One should never rely on it! Note, for example, that it may be a hardware built-in operation. – AndreyS Scherbakov Nov 20 '15 at 11:25