228

For clarity, if I'm using a language that implements IEE 754 floats and I declare:

float f0 = 0.f;
float f1 = 1.f;

...and then print them back out, I'll get 0.0000 and 1.0000 - exactly.

But IEEE 754 isn't capable of representing all the numbers along the real line. Close to zero, the 'gaps' are small; as you get further away, the gaps get larger.

So, my question is: for an IEEE 754 float, which is the first (closest to zero) integer which cannot be exactly represented? I'm only really concerned with 32-bit floats for now, although I'll be interested to hear the answer for 64-bit if someone gives it!

I thought this would be as simple as calculating 2bits_of_mantissa and adding 1, where bits_of_mantissa is how many bits the standard exposes. I did this for 32-bit floats on my machine (MSVC++, Win64), and it seemed fine, though.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
Floomi
  • 2,503
  • 2
  • 16
  • 10
  • Why did you add one if you wanted an irrepresentable number? And what number did you use or get? And is this homework? And your question title says "integer" but your question says "float". – msw Sep 25 '10 at 12:46
  • 6
    Because I figured that maxing the mantissa would give me the highest representable number. 2^22. No, it's a curiosity question. I've always felt guilty putting ints in floats, even when I know that the int in question is always going to be very small. I want to know what the upper limit is. As far as I can tell, the title and question are the same, just phrased differently. – Floomi Sep 25 '10 at 12:56
  • possible duplicate of [What's the first double that deviates from its corresponding long by delta?](http://stackoverflow.com/questions/732612/whats-the-first-double-that-deviates-from-its-corresponding-long-by-delta) – Andrew Mao Mar 26 '13 at 17:20
  • 1
    duplicate of http://stackoverflow.com/questions/1848700/biggest-integer-that-can-be-stored-in-a-double ? – FrankH. Jul 30 '13 at 23:11
  • @ks1322 Your edit makes the sentence ungrammatical. “however many” is not equivalent to “how many” and is used correctly in the original sentence (whereas “how many” does not fit). See http://www.english-test.net/forum/ftopic9565.html or many other Google results if you want to see more examples of that sort of phrase. – Pascal Cuoq Dec 14 '14 at 13:07
  • @PascalCuoq Completely disagree, even after reading your link (which doesn't seem to say anything about contexts in which "however" fits but "how" doesn't). – Kyle Strand Dec 24 '16 at 12:00
  • 1
    @KyleStrand reverted^2. I don't know why one seemed more correct to me than the other at the time. Now they both seem awkward compared to “… is the number of bits…” – Pascal Cuoq Dec 26 '16 at 23:40
  • @PascalCuoq Thanks for giving the matter further consideration and making the change! I agree, "the number" would be a superior phrasing. – Kyle Strand Dec 27 '16 at 06:27

2 Answers2

288

2mantissa bits + 1 + 1

The +1 in the exponent (mantissa bits + 1) is because, if the mantissa contains abcdef... the number it represents is actually 1.abcdef... × 2^e, providing an extra implicit bit of precision.

Therefore, the first integer that cannot be accurately represented and will be rounded is:

  • For 32-bit floats, 16,777,217 (224 + 1).
  • For 64-bit floats, 9,007,199,254,740,993 (253 + 1).

Here's an example in CPython 3.10, which uses 64-bit floats:

>>> 9007199254740993.0
9007199254740992.0
Kodiologist
  • 2,984
  • 18
  • 33
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 3
    I declared a `float` and set it equal to 16,777,217. But when I printed it using `cout` it resulted in 16,777,216. I'm using `C++`. Why can't I get 16,777,217? – sodiumnitrate Oct 14 '14 at 18:56
  • 36
    @sodiumnitrate Check the question title. 16777217 is the first integer **incapable** of being represented exactly. – kennytm Oct 15 '14 at 08:05
  • 1
    Ok, thanks. I got confused, sorry about that. I have another question though: after 16777216, shouldn't the next integer that is representable be 2*16777216? When I run a similar program, I get 16777218 by adding 2 to 16777126. – sodiumnitrate Oct 15 '14 at 15:54
  • 6
    The next integer is indeed 16777218, because 2 now becomes the last significant binary digit. – kennytm Oct 16 '14 at 07:53
  • How would you go about this if the number were to be even? – ylun.ca Nov 08 '15 at 23:36
  • Sorry, and also, how would we find, say , the second smallest integer? – ylun.ca Nov 10 '15 at 02:08
  • 8
    In C++, that's `(1 << std::numeric_limits::digits) + 1`, and in C, `(1 << FLT_MANT_DIG) + 1`. The former is nice because it can be part of a template. Don't add the +1 if you just want the largest representable integer. – Henry Schreiner Sep 21 '17 at 19:00
  • @HenrySchreiner you should submit that as an answer, it is a good and concise answer for C++ and C. – dashesy Nov 15 '17 at 22:58
  • Related: [`digits10`](http://en.cppreference.com/w/cpp/types/numeric_limits/digits10) – Martin Ba Nov 21 '17 at 12:17
  • What about largest negative numbers for `float` and `double`? – AlanSTACK Aug 02 '18 at 05:16
  • @AlanSTACK The negative number with largest magnitude is just the negation of the position one. – kennytm Aug 02 '18 at 05:49
  • So `-16,777,217` and `-9,007,199,254,740,993`? Or are they 1 off or something. – AlanSTACK Aug 04 '18 at 18:22
  • 2
    You can use this to examine floating point bit representations and find the min/max integer values: https://www.h-schmidt.net/FloatConverter/IEEE754.html here is another one for 16, 32, 64 and 128 bit floating point: http://weitz.de/ieee/ – Zack Morris Aug 08 '19 at 23:20
53

The largest value representable by an n bit integer is 2n-1. As noted above, a float has 24 bits of precision in the significand which would seem to imply that 224 wouldn't fit.

However.

Powers of 2 within the range of the exponent are exactly representable as 1.0×2n, so 224 can fit and consequently the first unrepresentable integer for float is 224+1. As noted above. Again.

thus spake a.k.
  • 1,607
  • 12
  • 12