Can a IEEE 754 real number "cover" all integers within its range?

Question

The original question was edited (shortened) to focus on a problem of precision, not range.

Single, or double precision, every representation of real number is limited to (-range,+range). Within this range lie some integer numbers (1, 2, 3, 4..., and so on; the same goes with negative numbers).

Is there a guarantee that a IEEE 754 real number (float, double, etc) can "cover" all integers within its range? By "cover" I mean the real number will represent the integer number exactly, not as (for example) "5.000001".

Just as reminder: http://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html nice explanation of various number representation formats.

Update:

Because the question is for "can" I am also looking for the fact this cannot be done -- for it quoting a number is enough. For example "no it cannot be done, for example number 1748574 is not represented exactly by float number" (this number is taken out of thin air of course).

For curious reader

If you would like to play with IEEE 754 representation -- on-line calculator: http://www.ajdesigner.com/fl_ieee_754_word/ieee_32_bit_word.php

@Macmade, errm, integer? Think about natural numbers, and then add to this set the same one but with minus sign. — greenoldman, Sep 15 '12 at 21:46
@MartijnPieters, I was thinking about it when writing in Scala, both versions suits me fine -- for Scala and general answer. — greenoldman, Sep 15 '12 at 21:46
@macias I was trying to suggest to add more details. Different languages may represent integers differently, and it also depends on the number of bits your system can use. — Macmade, Sep 15 '12 at 21:49
What *exactly* do you mean by "the range"? The range of *what*? — Keith Thompson, Sep 15 '12 at 21:55
@KeithThompson, of the numbers. If you take float, double, byte, int, long, etc the ranges of them are different. So once you set common range for pair being compared, you can say what integer numbers you check. For example Double vs. byte would be a set of (0,255) of integer numbers. — greenoldman, Sep 15 '12 at 21:58
@Macmade, skip the interpreted number representations in some languages, could you then provide any example for two languages that differ in representation? — greenoldman, Sep 15 '12 at 22:01
@macias Here you go: http://en.wikipedia.org/wiki/Signed_number_representations — Macmade, Sep 15 '12 at 22:06
@Macmade, thanks, but I am asking about contemporary **computers**. Sure, if you have some odd architecture, the representation will be different than today. — greenoldman, Sep 15 '12 at 22:09
*"of the numbers"*. Of *what* numbers? In your example, (0,255) is exactly the range of numbers representable in both types, so the answer is trivially yes. If you can rigorously define what range you're talking about, your question is answerable; if not, it isn't. — Keith Thompson, Sep 15 '12 at 22:42
`Is there a guarantee that a real number can "cover" all integers within common range?` Guarantees are made by the actual definition. If you mean real number representations in general, there is no answer. Do you want the sequence of integers that can be exactly represented by IEEE 32 float? — Ishtar, Sep 15 '12 at 23:02
@Ishtar, I said general answer as opposite to focusing solely on for example Pascal. Not a sequence, but a "proof" such covering exists. — greenoldman, Sep 16 '12 at 07:01
@KeithThompson, I would be grateful for a proof even for such trivial case for you as (0,255). Thank you in advance. — greenoldman, Sep 16 '12 at 07:05
@macias: I suspect the question you're really trying to ask is something like "For a given floating-point type, what is the largest contiguous range of integer values that it can represent exactly?" If you'll update your question to clarify just what you're asking (whether my guess is correct or not), then I'll try to update my answer. You *really* need to clarify exactly what "range" you're talking about. — Keith Thompson, Sep 16 '12 at 22:02

score 46 · Answer 1 · edited Dec 20 '21 at 18:09

No, not all, but there exists a range within which you can represent all integers accurately.

Structure of 32bit floating point numbers

The 32bit floating point type uses

1 bit for the sign
8 bits for the exponent
23 bits for the fraction (leading 1 implied)

Representing numbers

Basically, you have a number in the form

(-)1.xxxx_xxxx_xxxx_xxxx_xxxx_xxx (binary)

which you then shift left/right with the (unbiased) exponent.

To have it represent an integer requiring n bits, you need to shift it by n-1 bits to the left. (All xes beyond the floating point are simply zero)

Representing integers with 24 bits

It is easy to see, that we can represent all integers requiring 24 bits (and less)

1xxx_xxxx_xxxx_xxxx_xxxx_xxxx.0 (unbiased exponent = 23)

since we can set the xes at will to either 1 or 0.

The highest number we can represent in this fashion is:

1111_1111_1111_1111_1111_1111.0

or 2^24 - 1 = 16777215

The next higher integer is 1_0000_0000_0000_0000_0000_0000. Thus, we need 25 bits.

Representing integers with 25 bits

If you try to represent a 25 bit integer (unbiased exponent = 24), the numbers have the following form:

1_xxxx_xxxx_xxxx_xxxx_xxxx_xxx0.0

The twenty-three digits that are available to you have all been shifted past the floating point. The leading digit is always a 1. In total, we have 24 digits. But since we need 25, a zero is appended.

A maximum is found

We can represent ``1_0000_0000_0000_0000_0000_0000with the form1_xxxx_xxxx_xxxx_xxxx_xxxx_xxx0.0, by simply assigning 1to allxes. The next higher integer from that is: 1_0000_0000_0000_0000_0000_0001. It's easy to see that this number cannot be represented accurately, because the form does not allow us to set the last digit to 1: It is always 0`.

It follows, that the 1 followed by 24 zeroes is an upper bound for the integers we can accurately represent. The lower bound simply has its sign bit flipped.

Range within which all integers can be represented (including boundaries)

2²⁴ as an upper bound
-2²⁴ as a lower bound

Structure of 64bit floating point numbers

1 bit for the sign
11 exponent bits
52 fraction bits

Range within which all integers can be represented (including boundaries)

2⁵³ as an upper bound
-2⁵³ as a lower bound

This easily follows by applying the same argumentation to the structure of 64bit floating point numbers.

Note: That is not to say these are all integers we can represent, but it gives you a range within which you can represent all integers. Beyond that range, we can only represent a power of two multiplied with an integer from said range.

Combinatorial argument

Simply convincing ourselves that it is impossible for 32bit floating point numbers to represent all integers a 32bit integer can represent, we need not even look at the structure of floating point numbers.

With 32 bits, there are 2³² different things we can represent. No more, no less.
A 32bit integer uses all of these "things" to represent numbers (pairwise different).
A 32bit floating point number can represent at least one number with a fractional part.

Thus, it is impossible for the 32bit floating point number to be able to represent this fractional number in addition to all 2³² integers.

@macias: Yes, all numbers start with a binary one. The only number that does not is zero. (It has a special representation). Further, shifting and multiplying by powers of two are the same thing. I'm not sure what you mean with the `1.5` example. In binary, that is `1.1` with an unbiased exponent of 0. — phant0m, Sep 16 '12 at 10:20
@macias Ah, I think I see where the misunderstanding lies. All of my `x`es are binary digits. And shifting the decimal `1.5` in binary by one place to the left will yield `3`: `1.5=1.1b` -> `11.0b=3` — phant0m, Sep 16 '12 at 10:28
But the thing is you don't shift. You have 0.5 in binary format (101), you then prefix it with "1", then you multiply by 2. This gives you 3. In order to get "5" (one bit more in bit length -- "101") you have to build from 0.25 for fraction (bin: 11001) and change exponent (4 now). Please compare -- "101" and "11001". Those bits are more different than just shifting one of them. — greenoldman, Sep 16 '12 at 11:10
@macias What do you mean by "The thing is you don't shift"? The way you actually store decimal `0.5` as IEEE754 FP in binary is: `0__0111_1110__000_0000_0000_0000_0000_0000`: The sign is zero, the biased exponent is 126, adding the bias of -127 gives an unbiased exponent of -1, the fractional part is 0. Prepend the leading one, you have: 1.00... (binary) * 2^-1, which is 0.1 in binary, which is 0.5 in decimal. — phant0m, Sep 16 '12 at 11:46
Misunderstanding on the line -- consider storing number 3 and 5 in the IEEE 754 format. You will end up with fraction parts of 5 and 25, which are in binary format "101" and "11001". The one is not shifted version of the other. You can also consider 4 -- fraction part is 0 (again, this is not shifted version of the other two). — greenoldman, Sep 16 '12 at 14:55
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/16742/discussion-between-phant0m-and-macias) — phant0m, Sep 16 '12 at 14:56
I just received another upvote for my answer. I hope that that person just voted for this answer as well. I am a bit embarrassed that my answer was accepted over this excellent answer. — David Hammen, Jan 09 '14 at 23:18
*"Range within which all integers can be represented (including boundaries) 2^54 as an upper bound -2^54 as a lower bound"* I think you have a typo there. 2^53 + 1 cannot be represented in IEEE-754 double-precision floating point, and is well within the bounds given. I think you meant -2^53 <= n <= 2^53 (where the outer bounds at each end are each the point at which the format can no longer represent odd numbers). — T.J. Crowder, Dec 20 '21 at 17:49

score 9 · Accepted Answer · answered Sep 16 '12 at 14:12

9

macias, to add to the already excellent answer by phant0m (upvoted; I suggest you accept it), I'll use your own words.

"No it cannot be done, for example number 16777217 is not represented exactly by float number."

Also, "for example number 9223372036854775809 is not represented exactly by double number".

This is assuming your computer is using the IEEE floating point format, which is a pretty strong bet.

answered Sep 16 '12 at 14:12

David Hammen

32,454
9
60
108

To be clear, I upvoted Phant0m as well, but you hit the nail in the head in direct way, so I am accepting your answer instead. Thank you very much. Writing this number by hand was really educational, the lower limit is the 2^24-1, which is 16777215 and upper (2^24+2) 16777218, between -- a hole. Scary world (or I made a mistake while calculating this). – greenoldman Sep 16 '12 at 15:12
1

@macias No, thelimit is not 2^24-1, the upper limit is not 2^24+2 either – phant0m Sep 16 '12 at 15:33
1

Misunderstanding and my mistake again (thanks for chat) -- the smaller existing number is 2^24 (16777216; my mistake here) however the bigger existing number is 2^24+2 (16777218; misunderstanding here). – greenoldman Sep 16 '12 at 16:09
In Python: `int(16777217) == float(16777217)` gives `True` but `int(9223372036854775809) == float(9223372036854775809)` gives False. – Martin Thoma Feb 12 '18 at 10:40

Keith Thompson · Answer 3 · 2012-09-16T20:49:30.430

No.

For example, on my system, the type float can represent values up to approximately 3.40282e+38. As an integer, that would be approximately 340282000000000000000000000000000000000, or about 2¹²⁸.

The size of float is 32 bits, so it can exactly represent at most 2³² distinct numbers.

An integer object generally uses all of its bits to represent values (with 1 bit dedicated as a sign bit for signed types). A floating-point object uses some of its bits to represent an exponent (8 bits for IEEE 32-bit float); this increases its range at the cost of losing precision.

A concrete example (1267650600228229401496703205376.0 is 2¹⁰⁰, and is exactly representable as a float):

#include <stdio.h>
#include <float.h>
#include <math.h>
int main(void) {
    float x = 1267650600228229401496703205376.0;
    float y = nextafterf(x, FLT_MAX);
    printf("x = %.1f\n", x);
    printf("y = %.1f\n", y);

    return 0;
}

The output on my system is:

x = 1267650600228229401496703205376.0
y = 1267650751343956853325350043648.0

Another way to look at it:

A 32-bit object can represent at most 2³² distinct values.

A 32-bit signed integer can represent all integer values in the range -2147483648 .. 2147483647 (-2³¹ .. +2³¹-1).

A 32-bit float can represent many values that a 32-bit signed integer can't, either because they're fractional (0.5) or because they're too big (2.0¹⁰⁰). Since there are values that can be represented by a 32-bit float but not by a 32-bit int, there must be other values that can be represented by a 32-bit int but not by a 32-bit float. Those values are integers that have more significant digits than a float can handle, because the int has 31 value bits but the float has only about 24.

I wrote about range, because I am interested in precision in numbers, not by "escaping" the range. I know about the representation, and I wonder if it can (is it guaranteed) to express all the integers exactly. Within the range. — greenoldman, Sep 15 '12 at 21:49

James Waldby - jwpat7 · Answer 4 · 2012-09-16T08:21:08.710

Apparently you are asking whether a Real data type can represent all of the integer values in its range (absolute values up to FLT_MAX or DBL_MAX, in C, or similar constants in other languages).

The largest numbers representable by floating point numbers stored in K bits typically are much larger than the 2^K number of integers that K bits can represent, so typically the answer is no. 32-bit C floats exceed 10^37, 32-bit C integers are less than 10^10. To find out the next representable number after some number, use nextafter() or nextafterf(). For example, the code

printf ("%20.4f %20.4f\n", nextafterf(1e5,1e9), nextafterf(1e6,1e9));
printf ("%20.4f %20.4f\n", nextafterf(1e7,1e9), nextafterf(1e8,1e9));

prints out

     100000.0078         1000000.0625
   10000001.0000       100000008.0000

You might be interested in whether an integer J that is between two nearby fractional floating values R and S can be represented exactly, supposing S-R < 1 and R < J < S. Yes, such a J can be represented exactly. Every float value is the ratio of some integer and some power of 2. (Or is the product of some integer and some power of 2.) Let the power of 2 be P, and suppose R = U/P, S = V/P. Now U/P < J < V/P so U < J*P < V. More of J*P's low-order bits are zero than are those of U, V (because V-U < P, due to S-R < 1), so J can be represented exactly.

I haven't filled in all the details to show that J*P-U < P and V-J*P < P, but under the assumption S-R < 1 that's straightforward. Here is an example of R,J,S,P,U,V value computations: Let R=99999.9921875 = 12799999/128, (ie P=128); let S=100000.0078125 = 12800001/128; we have U=0xc34fff and V=0xc35001 and there is a number between them that has more low-order zeroes than either; to wit, J = 0xc35000/128 = 12800000/128 = 100000.0. For the numbers in this example, note that U and V require 24 bits for their exact representations (6 ea. 4-bit hex digits). Note that 24 bits is the number of bits of precision in IEEE 754 single-precision floating point numbers. (See table in wikipedia article.)

That each floating point number is a product or ratio of some integer and some power of 2 (as mentioned two paragraphs above) also is discussed in that floating point article, in a paragraph that begins:

By their nature, all numbers expressed in floating-point format are rational numbers with a terminating expansion in the relevant base (for example, ... a terminating binary expansion in base-2). Irrational numbers, such as π or √2, or non-terminating rational numbers, must be approximated. The number of digits (or bits) of precision also limits the set of rational numbers that can be represented exactly.

My bad English, I will polish (;-D) the text to make it more clear. — greenoldman, Sep 15 '12 at 22:04
I am lost at the beginning. I don't see that the P (power of 2) is at the time divisor of U and V. Also I don't see that more low-order bits are zero (which ones are "more"), and why it should prove that J can be represented exactly. I will think it over some more ;-). — greenoldman, Sep 16 '12 at 07:24

Can a IEEE 754 real number "cover" all integers within its range?

4 Answers4

Structure of 32bit floating point numbers

Structure of 64bit floating point numbers

Combinatorial argument

Linked

Related