Why is the IEEE754 double precision format only accurate to 15 digits?

Question

I am currently learning about floating point representations. According to this website, here are the possible ranges for this representation. -1.79E+308 to -2.23E-308, 0 and 2.23E-308 to 1.79E+308. The thing is, I don't quite understand what this range means. Since the exponent is 10^308, then shouldn't it be accurate to 308 digits? of course, this doesn't seem right as from my experience floating points don't have nearly that amount of accuracy.

What am I misunderstanding here? Help would be appreciated.

Does this answer your question? [Concept related to precision of float and double](https://stackoverflow.com/questions/67535592/concept-related-to-precision-of-float-and-double) — phuclv, Apr 25 '22 at 03:47
duplicates: [Concept related to precision of float and double](https://stackoverflow.com/q/67535592/995714), [Double precision - decimal places](https://stackoverflow.com/q/9999221/995714), [What's the meaning of precision digits in float / double / decimal?](https://stackoverflow.com/q/65945356/995714), [How to calculate float type precision and does it make sense?](https://stackoverflow.com/q/61609276/995714) — phuclv, Apr 25 '22 at 03:54
IEEE-754 `double` doesn't have infinite precision and it stores only 53 significand bits so obviously it can't be precise to 308 decimal digits — phuclv, Apr 25 '22 at 03:55
The key idea here is that floating-point representations use a smallish number (the *significand* or "mantissa"), combined with another smallish number (the *exponent*) to encode numbers over a very large *range*, although **without** maximum *precision*. — Steve Summit, Apr 25 '22 at 12:23
For example, using decimal scientific notation, if I give myself six digits of precision, and ±20 for my exponent, I can talk about a number like 123.456 × 10^20, which is 12345600000000000000000, but I *can't* represent the number 12345670000000000000000 (and I certainly can't represent the number 12345600000000000000001), because those would require more than 6 digits in the significand. — Steve Summit, Apr 25 '22 at 12:27
One way to help understand why you *don't* have 308 digits of precision is to try it. Just try it with 20 digits: `double d = 12345678901234567890.; printf("%f\n", d);`. It prints `12345678901234567168`. Add 1 to it: `d = d + 1; printf("%f\n", d);`. It still prints `12345678901234567168`. — Steve Summit, Apr 25 '22 at 14:42
xdxt "shouldn't it be accurate to 308 digits" --> that would be _fixed_ point. What do you think _floating_ means in _floating point_? — chux - Reinstate Monica, Apr 25 '22 at 19:04

Peter Cordes · Answer 1 · 2022-04-25T04:06:33.977

Go read about https://en.wikipedia.org/wiki/Significant_figures - floating point has a fixed number of significant digits (precision given by mantissa width), but they can be shifted to any magnitude within the exponent range (the exponent is encoded separately).

In floating-point, it's a certain number of binary bits, so huge floats can only represent a multiple of 4, 8, 16, 32, and ever coarser the larger the floating point number gets.

The wiki article for double-precision floating point is very good. It points, out among other things:

Integers from −2⁵³ to 2⁵³ (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly represented
Integers between 2⁵³ and 2⁵⁴ = 18,014,398,509,481,984 round to a multiple of 2 (even number)
Integers between 25⁴ and 2⁵⁵ = 36,028,797,018,963,968 round to a multiple of 4

For large doubles like 2^53, there's no mantissa bits left to encode a fractional part: the exponent field shifts them all up into the integer part. Only smaller numbers like 1.125 can have a fractional part. (And you still can't do 1.00000000000000000000000000000000001 because the two non-zero parts are too far away from each other.)

If your reasoning worked, shouldn't every floating point number be able to represent an infinite number of digits, if you include the fractional part to the right of the decimal point? Obviously not with a fixed 64-bit value, there are only 2^64 different bit-patterns, so it's a matter of how you spread out those values over the range you want to represent.

Floating point chooses a fixed number of digits, and lets the decimal point "float" to different positions based on the exponent. (Actually binary digits, and thus not a decimal point: the correct term would be radix point for base 2 digits. Unless you're using a "decimal floating point" format.)

For example, imagine an infinite string of zeros to the left and right of 4 decimal digits, but you can put a decimal point anywhere within the exponent range limit.

  0000012340000000.0             # large integer
  000001234.00000000             # small integer
  000001.23400000000             # small number near 1
  0.0000123400000000             # quite small number

You can equivalently think about the 1.234 mantissa being shifted left or right by the exponent, relative to the decimal point, to create a variable-sized fixed-point representation that actually has zero bits to fill space.

I'm using decimal for illustration purposes; only a few CPUs have instructions to support a decimal exponent (e.g. some PowerPC). The concept is identical for binary (base 2), with the radix point at some position.

I'm also leaving out some things like the implicit 1 at the top of the binary mantissa implied by a non-zero exponent encoding, and the way the exponent is actually encoded with a bias. See the wiki article for full details.

Also instructive to play around with https://www.h-schmidt.net/FloatConverter/IEEE754.html for single-precision floating point which shows you the bit-pattern (with checkboxes to modify bits), as well as the value represented separately by the mantissa and exponent fields, as well as the actual value represented overall.

For a more advanced look at some neat floating-point stuff, see Bruce Dawson's series of floating-point articles. Comparing Floating Point Numbers, 2012 Edition has links to all 16 of them, such as There are Only Four Billion Floats–So Test Them All!.

Some of them focus on practicalities of the FP environment in C on x86 and x86-64, another points out that incrementing the integer bit-pattern of a float is how nextafter can be implemented, increasing its magnitude. (The bias in the exponent encoding is what makes FP bit-patterns comparable as sign/magnitude integers, except for the NAN special case.)

Why is the IEEE754 double precision format only accurate to 15 digits?

1 Answers1

Linked

Related