70

My question is whether all integer values are guaranteed to have a perfect double representation.

Consider the following code sample that prints "Same":

// Example program
#include <iostream>
#include <string>

int main()
{
  int a = 3;
  int b = 4;
  double d_a(a);
  double d_b(b);

  double int_sum = a + b;
  double d_sum = d_a + d_b;

  if (double(int_sum) == d_sum)
  {
      std::cout << "Same" << std::endl;
  }
}

Is this guaranteed to be true for any architecture, any compiler, any values of a and b? Will any integer i converted to double, always be represented as i.0000000000000 and not, for example, as i.000000000001?

I tried it for some other numbers and it always was true, but was unable to find anything about whether this is coincidence or by design.

Note: This is different from this question (aside from the language) since I am adding the two integers.

Community
  • 1
  • 1
Thomas
  • 4,696
  • 5
  • 36
  • 71
  • 4
    You can use a loop to test every possible number. – mch Apr 27 '17 at 10:59
  • 16
    Short answer is "no" - the range of values an `int` can represent and that a `double` can represent are implementation defined - but a `double` certainly cannot support every integral value in the range it can represent. The practical answer is "it depends". – Peter Apr 27 '17 at 11:00
  • 5
    @mch: This would prove nothing, it could be true for my architecture but false for others... – Thomas Apr 27 '17 at 11:00
  • 4
    Is this guaranteed to be true for any architecture, any compiler, any values of a&b? No. AFAIK C++ does not specify any specific binary representation for floating point values. Of course, in practice, you can probably rely on most compilers on most platforms to use IEEE 754 floating point. – sigbjornlo Apr 27 '17 at 11:11
  • 1
    Check out http://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e, http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double?noredirect=1&lq=1 http://stackoverflow.com/questions/2234468/do-any-real-world-cpus-not-use-ieee-754?noredirect=1&lq=1 – sigbjornlo Apr 27 '17 at 11:11
  • 18
    The smallest positive integral value not representable by a 64bit IEEE double is 9007199254740993. This integral value has 54 significant bits and the double can only represent 53 significant bits. Note that 9007199254740992 takes 54 bits to represent in two's complement. But the double format can represent its mantissa with only 1 bit (well, actually zero because the leading bit is implicit). No need to loop to find this information. Use `numeric_limits::digits`, and `nextafter`. – Howard Hinnant Apr 27 '17 at 14:28
  • I suppose you meant `int int_sum` instead of `double int_sum`, right? Even though it doesn't change anything in the end, except for an unnecessary cast. – Lucas Trzesniewski Apr 27 '17 at 23:15
  • 1
    All integers with an absolute value of 2^53 or smaller can be exactly represented. – CodesInChaos Apr 28 '17 at 07:00
  • NOTE: Some languages such as Java are defined as supporting a sub-set of IEEE-754. – Peter Lawrey May 03 '17 at 07:40
  • @HowardHinnant a small quibble, the mantissa has an implied 1. for normal numbers. For denormal numbers (very small ones) it doesn't for reasons which I am sure made sense at the time. (I assume to handle 0) – Peter Lawrey May 03 '17 at 07:41

5 Answers5

85

Disclaimer (as suggested by Toby Speight): Although IEEE 754 representations are quite common, an implementation is permitted to use any other representation that satisfies the requirements of the language.


The doubles are represented in the form mantissa * 2^exponent, i.e. some of the bits are used for the non-integer part of the double number.

             bits        range                       precision
  float        32        1.5E-45   .. 3.4E38          7- 8 digits
  double       64        5.0E-324  .. 1.7E308        15-16 digits
  long double  80        1.9E-4951 .. 1.1E4932       19-20 digits

Schematic of IEEE 754 double type

The part in the fraction can also used to represent an integer by using an exponent which removes all the digits after the dot.

E.g. 2,9979 · 10^4 = 29979.

Since a common int is usually 32 bit you can represent all ints as double, but for 64 bit integers of course this is no longer true. To be more precise (as LThode noted in a comment): IEEE 754 double-precision can guarantee this for up to 53 bits (52 bits of significand + the implicit leading 1 bit).

Answer: yes for 32 bit ints, no for 64 bit ints.

(This is correct for server/desktop general-purpose CPU environments, but other architectures may behave differently.)

Practical Answer as Malcom McLean puts it: 64 bit doubles are an adequate integer type for almost all integers that are likely to count things in real life.


For the empirically inclined, try this:

#include <iostream>
#include <limits>
using namespace std;

int main() {
    double test;
    volatile int test_int;
    for(int i=0; i< std::numeric_limits<int>::max(); i++) {
        test = i;
        test_int = test;

        // compare int with int:
        if (test_int != i)
            std::cout<<"found integer i="<<i<<", test="<<test<<std::endl;
    }
    return 0;
}

Success time: 0.85 memory: 15240 signal:0


Subquestion: Regarding the question for fractional differences. Is it possible to have a integer which converts to a double which is just off the correct value by a fraction, but which converts back to the same integer due to rounding?

The answer is no, because any integer which converts back and forth to the same value, actually represents the same integer value in double. For me the simplemost explanation (suggested by ilkkachu) for this is that using the exponent 2^exponent the step width must always be a power of two. Therefore, beyond the largest 52(+1 sign) bit integer, there are never two double values with a distance smaller than 2, which solves the rounding issue.

Beginner
  • 5,277
  • 6
  • 34
  • 71
  • See http://stackoverflow.com/questions/12629087/is-floatinteger-integer-guaranteed-to-be-equal-in-c - this is different from my question. Even if double(4) is converted to 4.000000001, your test would still succeed since the integer is also (implicitely) converted to double. – Thomas Apr 27 '17 at 11:27
  • The upper part though sounds logical - but so did the other answer from someone else that just got deleted because it was wrong. So I'll wait for a day before accepting it in case someone with more knowledge than me finds an error :) – Thomas Apr 27 '17 at 11:28
  • Sorry, I misread your code. But my point still stands (I think). Say i = 15, then it might be the case that test = i results in test = 15.000000000001. Converting it back to int then results in the same int even though the double was not a perfect representation - as far as my knowledge goes. – Thomas Apr 27 '17 at 11:31
  • 3
    @Beginner This comparison will always return `false`, as the integer `test_int` will be implicitly converted to a `double` in this comparison. So even if the `double` can't represent the `int`, the `int` will be converted to the same imprecise representation, which then compares equal. – Corristo Apr 27 '17 at 11:49
  • 1
    @Corristo both numbers in the comparison are int, why is anything converted to double? I don't get it, can you explain in more detail? – Beginner Apr 27 '17 at 11:51
  • @Thomas if a given representation compares as equal and converts as equal it is essentially the same, right? – Beginner Apr 27 '17 at 11:52
  • 1
    @Beginner essentially yes, but for the sake of completeness, there might be some fringe cases: Say 1 (int) gets converted to 1.0000001. If you add that a million times as integers, you will (correctly) get you a million. If you convert it to double before doing so, you would get 1000001. – Thomas Apr 27 '17 at 13:35
  • But since according to your answer they are identical for any 32bit-integers, this is no problem in my code :) – Thomas Apr 27 '17 at 13:36
  • @Thomas yes, good point. Let me think if I have a good answer for this... – Beginner Apr 27 '17 at 13:38
  • IEEE 754 double-precision can guarantee this for up to 53 bits (52 bits of significand + the implicit leading 1 bit) – LThode Apr 27 '17 at 14:18
  • You've got 53 bits. If you need an integer larger than 2^53 one must ask what it represents. – Malcolm McLean Apr 27 '17 at 15:45
  • You should probably mention that although IEEE 754 representations are quite common, an implementation is permitted to use any other representation that satisfies the requirements of the language. – Toby Speight Apr 27 '17 at 15:48
  • @TobySpeight yeah, but is there actually a compiler for which it is not true? – Beginner Apr 27 '17 at 15:51
  • I don't know for sure, but I wouldn't assume too much outside the server/desktop general-purpose CPU environments. C targets DSPs and mainframes/minicomputers, some of which predate IEC-659/IEEE-754. – Toby Speight Apr 27 '17 at 15:54
  • @Beginner I think you should make clear that 64 bit doubles are an adequate integer type for almost all integers that are likely to count things in real life. – Malcolm McLean Apr 27 '17 at 16:11
  • @Beginner, you might also add that because the granularity is always a power of two, integers will never get fractional parts the shouldn't be there. – ilkkachu Apr 27 '17 at 17:10
  • @ilkkachu well, it's always a power of `std::numeric_limits::radix`. One could imagine an implementation where that is 10. – Random832 Apr 27 '17 at 23:32
  • @Random832 I am having difficulties imagining such an implementation on a binary computer. How would that ever make sense? – Beginner Apr 28 '17 at 06:39
  • @Random832, I meant to say "with IEEE floats", except 754 defines decimal floats too, so there's that. Make it "with the usual float implementations" then, though of course the same result applies with powers-of-ten. – ilkkachu Apr 28 '17 at 07:05
  • @Beginner, and yeah, decimal floats seem to make as much sense as BCD. there's the rounding benefit for human purposes. Though if I wanted to count something exactly down to parts per million or such, I might just use a scaled integer. But what do I know. – ilkkachu Apr 28 '17 at 07:12
  • You don't actually *need* BCD for decimal floats, strictly speaking - you could, for example, have the mantissa as a binary integer that is scaled by a power of 10 [e.g. 1.234 -> (1234, -3)] – Random832 Apr 28 '17 at 14:25
19

No. Suppose you have a 64-bit integer type and a 64-bit floating-point type (which is typical for a double). There are 2^64 possible values for that integer type and there are 2^64 possible values for that floating-point type. But some of those floating-point values (in fact, most of them) do not represent integer values, so the floating-point type can represent fewer integer values than the integer type can.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • 3
    What about 32-bit integers? The cardinality-argument is inconclusive for this case. – Beginner Apr 27 '17 at 12:56
  • 2
    @Beginner - even more extreme, absent perverse floating-point types, every integer value that can be represented as a value of `int8_t` (assuming the type actually exists) can be exactly represented as a double. But the question is broader than that; it asks, without qualification, if **all** integer values can be represented exactly as doubles, and the answer is "no". And the answer to the implied question is if you need to know which integers can be exactly represented in a double you have to know fairly intimate details of how double is represented on your system. This is not for beginners. – Pete Becker Apr 27 '17 at 14:13
  • 1
    Perverse or not, any integer value up to 10^6-1 is required to be exactly represented as a float, and any up to 10^10-1 as a double. At least, that's the case in C. Not only int8_t but int16_t and, for double, int32_t (and thus the minimum required ranges of int and long) is well within this range. – Random832 Apr 27 '17 at 23:40
12

The answer is no. This only works if ints are 32 bit, which, while true on most platforms, isn't guaranteed by the standard.

The two integers can share the same double representation.

For example, this

#include <iostream>
int main() {
    int64_t n = 2397083434877565865;
    if (static_cast<double>(n) == static_cast<double>(n - 1)) {
        std::cout << "n and (n-1) share the same double representation\n";
    }
}    

will print

n and (n-1) share the same double representation

I.e. both 2397083434877565865 and 2397083434877565864 will convert to the same double.

Note that I used int64_t here to guarantee 64-bit integers, which - depending on your platform - might also be what int is.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Corristo
  • 4,911
  • 1
  • 20
  • 36
  • @Corristo The OP's question was about integers. When you use 64 bit integers, you should use long double, then it works again for the whole range. – Rene Apr 27 '17 at 11:35
  • @Corristo Yes, but according to this question, http://stackoverflow.com/questions/9689049/what-decides-the-sizeof-an-integer, there are two such platforms. So, considering these, the answer is actually 'no'. (unless they would also have a `double` with 128 bits, I could not find this information) – Rene Apr 27 '17 at 12:32
  • 2
    @Thomas -- floating-point precision doesn't get worse the larger the into is. It stays exactly the same. Your integer values seem to have more precision because you're writing more non-zero digits, but their precision doesn't change, either. – Pete Becker Apr 28 '17 at 02:10
  • @PeteBecker: It's perfectly clear what people mean when they say that floating-point precision gets worse for large numbers. You can reliably distinguish numbers down to say .0001 for small values. For large numbers, you can't. There are a lot more values in [0,1] than in [10,11], even fewer in [100,101], and so on. – Nick Matteo Apr 28 '17 at 06:20
  • 1
    @Kundor -- yes, it's perfectly clear, and it's dead wrong. Programmers who want to use floating-point math effectively need to understand how it works and not base decisions on misleading bromides. – Pete Becker Apr 28 '17 at 11:16
  • @PeteB: On my machine, doubles near 1 are precise to about 2.2e-16 (1 eps). Doubles near 1 billion are precise to about 1.2e-7. Doubles near one trillion are precise to about 0.00012. You'll notice that precision is worse for larger numbers. – Nick Matteo Apr 28 '17 at 14:55
  • @Kundor -- doubles near 1 billion are precise to 1.2e-7 **divided by 1 billion**, i.e., about 1.2e-16. The precision is exactly the same as for values near 1, for which you give a precision of about 2.2e-16. Precision is about how many significant figures a value has. Setting aside computer representations for a moment, 1.2*10^1 has exactly the same precision as 1.2*10^20, i.e., 2 significant figures, even though adding 1 to the first value makes a noticeable change, while adding 1 to the second one is almost invisible. – Pete Becker Apr 28 '17 at 16:25
  • 1
    @PeteBecker: That's significance, not precision. "Precise to 0.1" doesn't mean that a value `n` is within 0.1 after you divide by `n`, it means that it's within 0.1 of the correct value. Saying that there are 12,000 people in a town is _less precise_ than saying there are 12 people in a room, when both numbers are understood to have 2 significant figures. To give the number of people in the town with the same _precision_ that you gave for the room, you would have to use _more_ significant figures, and say 12,132 people. – Nick Matteo Apr 28 '17 at 16:34
  • @Kundor -- that's not how "precision" is used in defining and discussing floating-point values. Real-number concepts simply don't apply to floating-point values and arithmetic. – Pete Becker Apr 28 '17 at 16:35
3

You have 2 different questions:

Are all integer values perfectly represented as doubles?

That was already answered by other people (TL;DR: it depends on the precision of int and double).

Consider the following code sample that prints "Same": [...] Is this guaranteed to be true for any architecture, any compiler, any values of a and b?

Your code adds two ints and then converts the result to double. The sum of ints will overflow for certain values, but the sum of the two separately-converted doubles will not (typically). For those values the results will differ.

Pablo H
  • 609
  • 4
  • 22
  • This is something I didn't have in mind when asking the question, but a good point nonetheless :) – Thomas Jun 16 '17 at 15:43
2

The short answer is "possibly". The portable answer is "not everywhere".

It really depends on your platform, and in particular, on

  • the size and representation of double
  • the range of int

For platforms using IEEE-754 doubles, it may be true if int is 53-bit or smaller. For platforms where int is larger than double, it's obviously false.

You may want be able to investigate the properties on your runtime host, using std::numeric_limits and std::nextafter.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103