4

I understand that the floating points are represented in memory using sign, exponent and mantissa form which have limited number of bits to represent each part and hence this leads to rounding errors. Essentially, lets say if i have a floating point number, then due to certain number of bits it basically gets mapped to one of the nearest representable form using he rounding strategy.

Does this mean that 2 different floating points can get mapped to same memory representation? If yes, then how can i avoid it programmatically?

I came across this std::numeric_limits<T>::max_digits10

It says the minimum number of digits needed in a floating point number to survive a round trip from float to text to float.

Where does this round trip happens in a c++ program i write. As far as i understand, i have a float f1 which is stored in memory (probably with rounding error) and is read back. I can directly have another float variable f2 in c++ program and then can compare it with original floating point f1. Now my question is when will i need std::numeric_limits::max_digits10 in this use case? Is there any use case which explains that i need to use std::numeric_limits::max_digits10 to ensure that i don't do things wrong.

Can anyone explain the above scenarios?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Test
  • 564
  • 3
  • 12
  • 2
    Please don't indent paragraphs like you did. It makes them code blocks, which puts the whole paragraph on one line and makes people scroll to read it. – Nate Eldredge Dec 12 '21 at 07:01
  • 1
    "*It says the minimum number of digits needed in a floating point number to survive a round trip from float to text to float.*" Um, what says this? – Nicol Bolas Dec 12 '21 at 07:13
  • 2
    You need it when you convert your floats to text and back to floats again. I'm not sure exactly what you're not understanding? – Alan Birtles Dec 12 '21 at 07:13
  • 1
    *"Where does this round trip happens in a c++ program i write."* -- that depends on the program you write. It doesn't happen in the same place in all programs, and there are programs where it does not happen at all (just like pretty much anything else, with a few exceptions). *Not sure what you meant to ask, but maybe something more like **how** can you make this round trip happen?* – JaMiT Dec 12 '21 at 07:22
  • 2
    *"when will i need std::numeric_limits::max_digits10 in this use case?"* -- you've constructed a situation with the expectation that if `max_digits10` is ever useful, it must be useful in the situation you came up with. This is not a good way to find a use for something. Maybe you'll luck into a valid use case, but more likely not. You're kind of limiting what kind of responses you can get by throwing these hasty assumptions into your question. – JaMiT Dec 12 '21 at 07:27

4 Answers4

3

Forget about the exact representation for a minute, and pretend you have a two bit float. Bit 0 is 1/2, and bit 1 is 1/4. Let's say you want to transform this number into a string, such that when the string is parsed, it yields the original number.

Your possible numbers are 0, 1/4, 1/2, 3/4. Clearly you can represent all of them with two digits past the decimal point and get the same number back, since the representation is exact in this case. But can you get away with a single digit?

Assuming half always rounds up, the numbers map to 0, 0.3, 0.5, 0.8. The first and third numbers are exact while the second and fourth are not. So what happens when you try to parse them back?

0.3 - 0.25 < 0.5 - 0.3, and 0.8 - 0.75 < 1 - 0.8. So clearly in both cases the rounding works out. That means you only need one digit past the decimal point to capture the value of our contrived two-bit floats.

You can expand the number of bits from two to 53 (for a double), and add an exponent to alter the scale of the number, but the concept is exactly the same.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
3

Why do we need std::numeric_limits::max_digits10?

To know how many significant decimal digits to convert a floating point type to text distinctively for all possible values of that type.


Does this mean that 2 different floating points can get mapped to same memory representation? If yes, then how can i avoid it programmatically?

No, different floating point objects, that differ in value, will have different encoding.

Yes, different floating point code, that differ in text, may map to same memory representation. x1, x2 below certainly have the same encoding. A 32-bit float can only encode about 232 different values. Many different floating point constants map to the same float.

float x1 = 1.000000000000000001f;
float x2 = 1.000000000000000001000000000000000001f;
assert(x1 == x2);

Where does this round trip happens in a c++ program i write. Now my question is when will i need std::numeric_limits::max_digits10 in this use case? Is there any use case which explains that i need to use std::numeric_limits::max_digits10 to ensure that i don't do things wrong.

If code converts a floating point x to string s and then back to floating point y, then that is the round trip of concern.

For x == y to hold true, then s should contain at least max_digits10 significant decimal digits to work for all x.

With fewer than max_digits10 significant decimal digits, x == y may still be true for some x, but not all.

With more than max_digits10 significant decimal digits, x == y is true for all x, yet s grows unnecessarily long.


Significant decimal digits

The significant digit count begins is not the number of digits to the right of the ., but the count from the most significant non-zero digit. All below, as code or text, have 9 significant decimal digits.

1.23456789
12345.6789
123456789.
123456789f
1.23456789e10
1.23456789e-10
-1.23456789
12345.0000
00012345.6789
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    One thing I am concerned about in the C++ phrasing is that maybe there is an x and a y that map to different decimal strings, but converting them back yields x for both. For example, suppose consecutive floating-point values are 9.821…, 9.942…, and 10.063… Rounded to two decimal digits, the latter two go to 9.9 and 10. Then converting back yields 9.94 (closest of the three to 9.9) and 9.94 (closest of the three to 10). If so, then C++’s “always differentiated” differs from C’s and IEEE-754’s “round trip works.” … – Eric Postpischil Dec 13 '21 at 17:49
  • 1
    … With the above example, if 10.184 follows, it breaks the example, because 10.063 and 10.184 would not map to differentiated two-digit numbers (both go to 10), so 2 could not be the `max_digits10` for this format. But maybe this happens just at the edge of the exponent range, so 10.184 is not in the set of representable numbers. So it needs further study. If it can occur, I think it is a defect in the C++ standard; it was likely not the intent merely to produce differentiation but to fully ensure round trips with round-to-nearest work. – Eric Postpischil Dec 13 '21 at 17:50
  • 1
    @EricPostpischil Are you suggesting `max_digits10 == 2` for these comment examples? (in which case the FP type has, at most, 3 binary digits in the significand.) – chux - Reinstate Monica Dec 13 '21 at 17:58
  • 1
    Yes, but that was just for illustration. In actual formats, could there be some x and y that convert to different 16-decimal-digit numbers that both convert back to x? (That would not actually make `max_digits10` 16, because there are many other numbers in the `double` format that require 17 digits to be differentiated. So it would not cause C++’s definition for `max_digits10` to have the wrong value. The question would be whether there is any floating-point format and an n such that all its values convert to different n-decimal-digit numbers but some do not convert back to their originals?) – Eric Postpischil Dec 13 '21 at 18:07
  • 1
    @EricPostpischil I have found no example of "x and y that convert to different 16-decimal-digit numbers that both convert back to x", so at this point assume it is true. Will ponder even smaller `n`. – chux - Reinstate Monica Dec 13 '21 at 23:06
  • @EricPostpischil Still came up with no counter-example - assuming I follow the issue right. It is interesting to try to find the minimum decimal digits needed for roundtripping. IIRC, MS VS C++ had a format specifier to do that, but got the function wrong for some values. – chux - Reinstate Monica Dec 16 '21 at 12:57
2

You seem to be confusing two sources of rounding (and precision loss) with floating point numbers.

Floating point representation

The first one is due to the way floating point numbers are represented in memory, which uses binary numbers for the mantissa and exponent, as you just pointed. The classic example being :

const float a = 0.1f;
const float b = 0.2f;
const float c = a+b;

printf("%.8f + %.8f = %.8f\n",a,b,c);

which will print

0.10000000 + 0.20000000 = 0.30000001

There, the mathematically correct result is 0.3, but 0.3 is not representable with the binary representation. Instead you get the closest number which can be represented.

Saving to text

The other one, which is where max_digits10 comes into play, is for text representation of floating point number, for example, when you do printf or write to a file.

When you do this using the %f format specifier you get the number printed out in decimal.

When you print the number in decimal you may decide how many digits get printed out. In some cases you might not get an exact printout of the actual number.

For example, consider

const float x = 10.0000095f;
const float y = 10.0000105f;
printf("x = %f ; y = %f\n", x,y);

this will print

x = 10.000010 ; y = 10.000010

on the other hand, increasing the precision of printf to 8 digits with %.8f will give you.

 x = 10.00000954 ; y = 10.00001049

So if you wanted to save these two float values as text to a file using fprintf or ofstream with the default number of digits, you may have saved the same value twice where you originally had two different values for x and y.

max_digits10 is the answer to the question "how many decimal digits do I need to write in order to avoid this situation for all possible values ?". In other words, if you write your float with max_digits10 digits (which happens to be 9 for floats) and load it back, you're guaranteed to get the same value you started with.

Note that the decimal value written may be different from the floating point number's actual value (due to the different representation. But it is still guaranteed than when you read the text of the decimal number into a float you will get the same value.

Edit: an example

See the code runt there : https://ideone.com/pRTMZM

Say you have your two floats from earlier,

const float x = 10.0000095f;
const float y = 10.0000105f;

and you want to save them to text (a typical use-case would be saving to a human-readable format like XML or JSON, or even using prints to debug). In my example I'll just write to a string using stringstream.

Let's try first with the default precision :

stringstream def_prec;
def_prec << x <<" "<<y;

// What was written ?
cout <<def_prec.str()<<endl;

The default behaviour in this case was to round each of our numbers to 10 when writing the text. So now if we use that string to read back to two other floats, they will not contain the original values :

float x2, y2;
def_prec>>x2 >>y2;

// Check
printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);

and this will print

10 10
10.00000954 vs 10.00000000
10.00001049 vs 10.00000000

This round trip from float to text and back has erased a lot of digits, which might be significant. Obviously we need to save our values to text with more precision than this. The documentation guarantees that using max_digits10 will not lose data in the round trip. Let's give it a try using setprecision:

const int digits_max = numeric_limits<float>::max_digits10;
stringstream max_prec;
max_prec << setprecision(digits_max) << x <<" "<<y;
cout <<max_prec.str()<<endl;

This will now print

10.0000095 10.0000105

So our values were saved with more digits this time. Let's try reading back :

float x2, y2;
max_prec>>x2 >>y2;
    
printf("%.8f vs %.8f\n", x, x2);
printf("%.8f vs %.8f\n", y, y2);

Which prints

10.00000954 vs 10.00000954
10.00001049 vs 10.00001049

Aha ! We got our values back !

Finally, let's see what happens if we use one digit less than max_digits10.

stringstream some_prec;
some_prec << setprecision(digits_max-1) << x <<" "<<y;
cout <<some_prec.str()<<endl;

Here this is what we get saved as text

10.00001 10.00001

And we read back :

10.00000954 vs 10.00000954
10.00001049 vs 10.00000954

So here, the precision was enough to keep the value of x but not the value of y which was rounded down. This means we need to use max_digits10 if we want to make sure different floats can make the round trip to text and stay different.

Louen
  • 3,617
  • 1
  • 29
  • 49
  • 2
    Hm, just a comment. Why are you using printf to answer a C++ function. Maybe iostream functions would be more approriate. But anyway. You answered the essence of the question very well. Thank you +1 – A M Dec 12 '21 at 09:46
  • 1
    @Louen - thanks for great explanation. Could you please just elaborate the last paragraph of yours so that I could see it via a test program? – Test Dec 12 '21 at 11:24
  • 2
    Note that .1 and .2 also cannot be represented perfectly. – Yakk - Adam Nevraumont Dec 12 '21 at 16:30
  • 1
    @Test here's your example ! @ArminMontigny : in this case `printf` made more sense to me for conciseness by explicitly setting the precision in the format string. In my edited answer I also use `setprecision`. – Louen Dec 13 '21 at 16:53
  • 2
    To avoid [double rounding](https://stackoverflow.com/q/66631288/2410359) distraction, better to append an `f` as in `float x = 10.0000095;` --> `float x = 10.0000095f;` for initializing a `float`. – chux - Reinstate Monica Dec 13 '21 at 16:56
  • `printf("%.8f vs %.8f\n", x, x2);` is a poor example as the number of digits needed relates to _exponential_ notation, not _fixed_ point. Recommend `printf("%.8e vs %.8e\n", x, x2);`. Similar issues applies to various C++ output too - in fact, most this this answer. – chux - Reinstate Monica Dec 13 '21 at 17:01
1

Where does this round trip happens in a c++ program i write.

That depends on the code you write, but an obvious place would be... any floating-point literal you put in your code:

float f = 10.34529848505433;

Will f be exactly that number? No. It will be an approximation of that number because most implementations of float can't store that much precision. If you changed the literal to 10.34529848505432, odds are good f will have the same value.

This is not about round-tripping per-se. The standard defines max_digits10 purely in terms of going from decimal to float:

Number of base 10 digits required to ensure that values which differ are always differentiated.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    The parsing of a decimal literal to initialize a `float` is not a round trip; it is a single direction. Even if it is converted back to decimal and printed, that is a round trip from decimal to `float` to decimal, which is not the round trip asked about in the question, from `float` to decimal to `float`. And that round trip is relevant to `max_digits10`; `max_digits10` is the number of digits needed for that intermediate decimal to ensure the round trip returns to the original `float`. – Eric Postpischil Dec 12 '21 at 15:23
  • 1
    @EricPostpischil: "*And that round trip is relevant to max_digits10;*" [Not according to the standard:](https://timsong-cpp.github.io/cppwp/n4861/numeric.limits.members#14) "**Number of base 10 digits required to ensure that values which differ are always differentiated.**" I don't know where people are getting this "round trip" stuff from, but it isn't from the C++ standard. – Nicol Bolas Dec 12 '21 at 15:56
  • 2
    Consider what it means to “ensure that values which differ are always differentiated.” It means that if `x` and `y` are floating-point values that differ, then converting them to decimal with `max_digits10` significant digits ensures they are differentiated (the result of the conversion produces different results for `x` and `y`), which then means it is possible to determine the original values when converting back. If they were not differentiated, that would not be possible. So that phrasing is saying the same thing as that a round trip reproduces the original value, just in different words. – Eric Postpischil Dec 12 '21 at 16:35
  • @EricPostpischil: It may by inference provide some kind of round-trip guarantee, but that's not the *point* of it. Remember: the point of your initial comment was that my example didn't show round trip behavior and that's wrong for some reason. But the point of my example is that `max_digits10` isn't about round-tripping. It has meaning outside of that context. – Nicol Bolas Dec 12 '21 at 16:39
  • 2
    The “round trip” phrasing is a mathematician’s way of stating that the decimal numeral contains sufficient information to distinguish the original value. The two statements are mathematically equivalent, and it is the point to deliver to the user the minimum number of significant digits needed to guarantee sufficient information is delivered. C++’s `max_digits10` comes from C’s `FLT_DECIMAL_DIG`, `DBL_DECIMAL_DIG`, and `LDBL_DECIMAL_DIG`, which is defined with the round-trip phrasing (C 2018 5.2.4.2.2 12). – Eric Postpischil Dec 12 '21 at 16:43
  • 2
    And that in turn stems from IEEE 754, which, in the 2008 version, has the description of the equivalent value, *Pmin* ( *bf* ) (*Pmin* is analogous to `max_digits10` and `*bf*` is the format, analogous to `float` or `double`) appears in 5.12.2, which says “Conversions from a supported binary format *bf* to an external character sequence and back again results in a copy of the original number so long as there are at least *Pmin* ( *bf* ) significant digits specified and the rounding-direction attributes in effect during the two conversions are round to nearest rounding-direction attributes.” – Eric Postpischil Dec 12 '21 at 16:44
  • Note: `float f = 10.34529848505433;` is a [double rounding](https://stackoverflow.com/q/66631288/2410359). To avoid that distracting issue, code `float f = 10.34529848505433f;` (Append an `f`). – chux - Reinstate Monica Dec 13 '21 at 16:53