3

Will printf('%.9e', value) always print the exact base10 representation of value if value is an IEEE single precision floating-point number (C/C++ float)?

Will the same hold for printf('%.17e', value) if value is an IEEE double precision floating-point number (C/C++ double)?

If not, how can I?

It appears that printf('%.17f', value) and printf('%.17g', value) will not.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
Patrick
  • 147
  • 1
  • 15
  • Is the question about *exact* or about base10? Please show an example. – ouah Sep 17 '15 at 22:00
  • 1
    I think every base2 number can be represented exactly as a base10 number, so I think both. I know that not every base10 number can be represented exactly as a base2 number, but I'm not concerned about that. I'm assuming that the number already exists as a base2 number in a float or double. I'm not actually sure how to show an example. – Patrick Sep 17 '15 at 22:07
  • 2
    An ieee754 single-precision float has 23 bits of precision, and 10 only has a single power of two factor, so I expect it's possible to find a single-precision float that takes 23 significant decimal digits to represent exactly. – EOF Sep 17 '15 at 22:09
  • 1
    Required reading: [What every computer scientist should know about floating point.](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – Thomas Matthews Sep 17 '15 at 22:10
  • 1
    @ThomasMatthews: How does that relate to the question? – Oliver Charlesworth Sep 17 '15 at 22:10
  • 1
    @OliverCharlesworth: It explains how the *exact base 10 representation* of any value is interpreted, emphasis on **exact**. – Thomas Matthews Sep 17 '15 at 22:12
  • @ThomasMatthews: I guess I was implying that a link to a specific part of that rather large document might be useful ;) – Oliver Charlesworth Sep 17 '15 at 22:14
  • @ThomasMatthews: I haven't read through the paper, but that is the reference I used to get the 9 and 17 numbers. – Patrick Sep 17 '15 at 22:17
  • 1
    Take the number 1.0/3.0 for example. Is printing 9 decimals more accurate than printing 17? – Thomas Matthews Sep 17 '15 at 22:20
  • 1
    @EOF: It's worse than that. Imagine 1 - 2^-23. That's 23 binary orders of magnitude to account for representing both the 1 and the 2^-23, and then on top of that, 2^-23 itself has 23 decimal significant figures! – Oliver Charlesworth Sep 17 '15 at 22:21
  • @EOF: One such value is `0x3F80001` which is very similar to Oliver's suggestion, it is 1 + 2**-23 – Ben Voigt Sep 17 '15 at 22:23
  • @OliverCharlesworth Ah, but you see how cleverly I formulated my comment? It is not at all invalidated by your observation. – EOF Sep 17 '15 at 22:23
  • @OliverCharlesworth: Couldn't you represent 1 - 2**-24, actually? (implied 2**-1, and then the 23 mantissa bits are 2**-2 ... 2**-24) – Ben Voigt Sep 17 '15 at 22:25
  • @BenVoigt: Yes, I think you're right. – Oliver Charlesworth Sep 17 '15 at 22:26
  • `0x3F7FFFFF` is the number we're discussing, which is `9.99999940395355224609375E-1` Lots more than 9 decimal digits needed. And that's only a single precision float. – Ben Voigt Sep 17 '15 at 22:30
  • @BenVoigt: Not as bad as I predicted, then! – Oliver Charlesworth Sep 17 '15 at 22:31
  • @Oliver: Yeah, 2**-24 itself has 24 decimal figures, but the first bunch of them are zeroes. It doesn't stack on top of the magnitude difference. – Ben Voigt Sep 17 '15 at 22:34
  • Maybe I'm doing something wrong. For 0x3F7FFFFF I get 1.06535321500000000e+09. – Patrick Sep 17 '15 at 22:38
  • Using [this](http://www.h-schmidt.net/FloatConverter/IEEE754.html) I get the value you cite though. – Patrick Sep 17 '15 at 22:43
  • But it only shows 16 decimal digits: 0.9999999403953552 Are we sure it can be stored entirely in an IEEE double precision number? – Patrick Sep 17 '15 at 22:46
  • For me float min need %.88e to be correctly represented. Try yourself `printf('%.90e', std::FLT_MIN)` You need to `#include ` – Logman Sep 17 '15 at 23:16
  • And to correctly show min of double I need to set precision to 715. – Logman Sep 17 '15 at 23:31
  • Each base 10 digit can represent a single base 2 digit. So if the number can be represented by 23 bits, it can be represented by 23 decimal digits. Adding an exponent, as the float spec does, means that many more digits will be necessary over the full range. – Mark Ransom Sep 18 '15 at 02:02

2 Answers2

3

Will printf('%.9e', value) always print the exact base10 representation?

No. Consider 0.5, 0.25, 0.125, 0.0625 .... Each value is one-half the preceding and needs another decimal place for each decremented power of 2.

float, often binary32 can represent values about pow(2,-127) and sub-normals even smaller. It would take 127+ decimal places to represent those exactly. Even counting only significant digits, then number is 89+. Example FLT_MIN on one machine is exactly

0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625

FLT_TRUE_MIN, the smallest non-zero sub-normal is 151 digits:

0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125

By comparison, FLT_MAX only takes 39 digits.

340282346638528859811704183484516925440

Rarely are exact decimal representation of float needed. Printing them to FLT_DECIMAL_DIG (typically 9) significant digits is sufficient to uniquely display them. Many systems do not print exact decimal representation beyond a few dozen significant digits.

Vast majority of systems I have used printed float/double exactly to at least DBL_DIG significant digits (typically 15+). Most systems do so at least to DBL_DECIMAL_DIG (typically 17+) significant digits.

Printf width specifier to maintain precision of floating-point value gets into these issues.

printf('%.*e', FLT_DECIMAL_DIG - 1, value) will print a float to enough decimals places to scan it back and get the same value - (round-trip).

Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I think I see. So for example, FLT_MIN is exactly `0.000000000000000000000000000000000000011754943508222875079687365372222456778186655567720875215087517062784172594547271728515625` in base10. But there is no other single precision floating-point number that will represent `0.0000000000000000000000000000000000000117549435`, the first 9 significant digits. So if I only print those, I can still get back the original exact bits of the FLT_MIN. – Patrick Sep 18 '15 at 04:00
  • @Patrick Yes,: Note: the preceding and subsequent `float`s are `0.000...000117549421069...` and `0.000...000117549449095...` – chux - Reinstate Monica Sep 18 '15 at 04:08
  • @Patrick Note: `printf('%.9e', value)` prints `value` to 10 significant decimal digits. – chux - Reinstate Monica Sep 18 '15 at 04:58
  • Why is that? What does significant decimal digits mean exactly? Are you saying I would only need `printf('%.8e', value)`? – Patrick Sep 18 '15 at 05:15
  • Because there will always be a single digit printed before the decimal point with '%e', which is significant? – Patrick Sep 18 '15 at 05:17
  • I have two references, but I'm not sure how to put them together. FLT_DIG, DBL_DIG, LDBL_DIG: Number of decimal digits that can be rounded into a floating-point and back without change in the number of decimal digits. printf precision specifier: For a, A, e, E, f and F specifiers: this is the number of digits to be printed after the decimal point (by default, this is 6). – Patrick Sep 18 '15 at 05:30
  • I can't find FLT_DECIMAL_DIG. – Patrick Sep 18 '15 at 05:41
  • @Patrick Significant decimal digits examples 1.23 has 3 significant decimal digits, 0.000123 has 3, 123,000,000.0 has 3. – chux - Reinstate Monica Sep 18 '15 at 13:43
  • @Patrick `FLT_DECIMAL_DIG` introduced in C11. Suggest posting question on how to use `FLT_DIG`. – chux - Reinstate Monica Sep 18 '15 at 13:45
  • From your answer to this [question](http://stackoverflow.com/questions/16839658/printf-width-specifier-to-maintain-precision-of-floating-point-value) it appears that `FLT_DIG` and `DBL_DIG` are for going from string to number instead of from number to string? Would I be safe just using 9 and 17 or might it vary by platform? – Patrick Sep 21 '15 at 20:28
  • @Patrick Both Q&A are from floating-point to string. Using 9,17 is not safe in general, but OK if "IEEE 754 binary". "IEEE" and "IEEE 754" are not specific enough. Else use `FLT_DECIMAL_DIG, DBL_DECIMAL_DIG`, not `FLT_DIG, DBL_DIG`. IAC, if your goal is now not "floating-point number to **exact** base10 character string", what is it? – chux - Reinstate Monica Sep 21 '15 at 20:38
  • I didn't realize until now that the exact base10 character string and the base10 character string needed to retain all of the information in the original floating-point number were two different things. I just need the base10 character string needed to retain all of the information in the original floating-point number. – Patrick Sep 21 '15 at 20:47
  • @Patrick The [other post](http://stackoverflow.com/questions/16839658/printf-width-specifier-to-maintain-precision-of-floating-point-value) explains what is needed. Interestingly, more that `FLT_DECIMAL_DIG, DBL_DECIMAL_DIG` digits can cause an issue in corner cases. To be clear, when going from base A to base B back to base A, versus going from base B to base A back to base B have different requirements. That is the difference between `DBL_DIG` and `DBL_DECIMAL_DIG`. If possible use [hex](http://stackoverflow.com/a/16840224/2410359) – chux - Reinstate Monica Sep 21 '15 at 20:54
  • I would prefer decimal, but `g++ -std=c++11` doesn't seem to have `FLT_DECIMAL_DIG`, at least for version 4.8.4. Do you know if it was added in a more recent version? Otherwise `printf("%a", value)` should work for **both** single and double precision? – Patrick Sep 21 '15 at 21:29
  • Does [DECIMAL_DIG](http://www.cplusplus.com/reference/cfloat/), i.e. `printf("%.*e", DECIMAL_DIG - 1, value)`, work for both single and double precision? – Patrick Sep 21 '15 at 21:57
  • @Patrick Keep forgetting this is a C++ question too - unsure about C++. If `FLT_DECIMAL_DIG` does not exist, use `#define FLT_DECIMAL_DIG (FLT_DIG + 3)`. On some platforms the +3 should be +2, +1 or rarely +0. It is `ceil(1 + 24*log10(2))` or 9 for a 24 bit significant like [binary32](https://en.wikipedia.org/wiki/Single-precision_floating-point_format). Or yes, save grief: use `"%a"` for both `float/double`. – chux - Reinstate Monica Sep 21 '15 at 21:58
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/90268/discussion-between-chux-and-patrick). – chux - Reinstate Monica Sep 21 '15 at 21:59
  • In case it helps someone else, I had to use `__FLT_DECIMAL_DIG__` instead of `FLT_DECIMAL_DIG` for g++. – Patrick Sep 22 '15 at 02:07
2

The IEEE-754 format for a 32-bit floating point number is explained in this Wikipedia article.

The following table shows the bit weights for each bit, given that the exponent is 0, meaning
1.0 <= N < 2.0. The last number in the table is the largest number less than 2.0.

From the table, you can see that you need to print at least 23 digits after the decimal point to get the exact decimal number from a 32-bit floating point number.

3f800000 1.0000000000000000000000000   (1)
3fc00000 1.5000000000000000000000000   (1 + 2^-1)
3fa00000 1.2500000000000000000000000   (1 + 2^-2)
3f900000 1.1250000000000000000000000   (1 + 2^-3)
3f880000 1.0625000000000000000000000   (1 + 2^-4)
3f840000 1.0312500000000000000000000   (1 + 2^-5)
3f820000 1.0156250000000000000000000   (1 + 2^-6)
3f810000 1.0078125000000000000000000   (1 + 2^-7)
3f808000 1.0039062500000000000000000   (1 + 2^-8)
3f804000 1.0019531250000000000000000   (1 + 2^-9)
3f802000 1.0009765625000000000000000   (1 + 2^-10)
3f801000 1.0004882812500000000000000   (1 + 2^-11)
3f800800 1.0002441406250000000000000   (1 + 2^-12)
3f800400 1.0001220703125000000000000   (1 + 2^-13)
3f800200 1.0000610351562500000000000   (1 + 2^-14)
3f800100 1.0000305175781250000000000   (1 + 2^-15)
3f800080 1.0000152587890625000000000   (1 + 2^-16)
3f800040 1.0000076293945312500000000   (1 + 2^-17)
3f800020 1.0000038146972656250000000   (1 + 2^-18)
3f800010 1.0000019073486328125000000   (1 + 2^-19)
3f800008 1.0000009536743164062500000   (1 + 2^-20)
3f800004 1.0000004768371582031250000   (1 + 2^-21)
3f800002 1.0000002384185791015625000   (1 + 2^-22)
3f800001 1.0000001192092895507812500   (1 + 2^-23)

3fffffff 1.9999998807907104492187500

One thing to note about this is that there are only 2^23 (about 8 million) floating point values between 1 and 2. However, there are 10^23 numbers with 23 digits after the decimal point, so very few decimal numbers have exact floating point representations.

As a simple example, the number 1.1 does not have an exact representation. The two 32-bit float values closest to 1.1 are

3f8ccccc 1.0999999046325683593750000
3f8ccccd 1.1000000238418579101562500
user3386109
  • 34,287
  • 7
  • 49
  • 68
  • I'm not worried about going from base10 to base2. I notice that in the list, except for the last one, there are no more than 17 nonzero digits past the decimal point? – Patrick Sep 17 '15 at 23:33
  • I think where the confusion is, for example in the single point case, given a floating point number and its unsigned representation (closest 9 decimals), changing the unsigned representation by 1 only changes the float in the `6th` decimal place. (e.g. `123.456` (`123.456001`) closest unsigned is `1123477881`. Changing by one, `1123477882` yields a change to the float of `123.456008911`) – David C. Rankin Sep 17 '15 at 23:43
  • @Patrick *"...no more than 17 nonzero digits past the decimal point?"* True, but keep in mind that each of those number has exactly 1 bit set in the fraction. The number at the left of the table is the hex representation of the number. The `3f8` represents an exponent of 0, the rest of the bits are the fraction. You can have any combination of bits in the fraction. For example, 0x3f820001 is going to have 22 non-zero digits past the decimal point, which you can see by adding the two lines that start with 0x3f820000 and 0x3f800001. The result is 1.01562511920928955078125. – user3386109 Sep 18 '15 at 01:03
  • I guess I am confused by the statement: "If an IEEE 754 single precision is converted to a decimal string with at least 9 significant decimal digits and then converted back to single, then the final number must match the original." – Patrick Sep 18 '15 at 01:29
  • 2
    @Patrick When you convert to a decimal string, you round to the nearest decimal value that has 9 significant digits. When converting back to binary, you round to the nearest binary number that has 23 significant bits. The guarantee in the statement you quoted is that the rounding errors will not result in a different value than you started with. – user3386109 Sep 18 '15 at 01:46