1

I have a basic understanding in float-point number and was reading this article which says:

0.1 + 0.2: This equals 0.3, but in floating-point: (0.1 + 0.2) == 0.3 is false. This is because 0.1, 0.2 and 0.3 cannot be precisely represented in base 2 floating-point.

Well, that's true according to the nature of floating point number, but I wrote a simple program to test:

float a = 0.1;
float b = 0.2;

if(a+b == 0.3)
{
  printf("true");
} else 
{
  printf("false");
}
// result is true

but the output is actually true. Here is my two questions:

  1. I think what happens is, because C uses the round-to-even rounding mode, so after rounding, it happens to be true, is my understanding correct?

  2. If my understanding is correct, then there must be some specified float-point number won't be true in this case, because there is still a small chance that rounding might fail. So that must be some combination as

    float a = ...;
    float b = ...;
    if(a+b == XXX)  // where XXX is the "intuitive" sum of a and b
    {
      printf("true");
    } else 
    {
      printf("false");   
    }
    
    //result is false now
    

Is my understanding correct?

GSerg
  • 76,472
  • 17
  • 159
  • 346
  • 2
    What compiler are you using? I get `false` for your first code block above with gcc. (Probably because `0.3` is a `double` literal.) – T.J. Crowder Jul 04 '20 at 08:44
  • `printf("%d\n", (int)(sizeof (double) - sizeof (float)));` – pmg Jul 04 '20 at 08:44
  • 2
    I think it's useful: https://stackoverflow.com/questions/588004/is-floating-point-math-broken#:~:text=The%20expression%200.1%20%2B%200.2%20%3D%3D%3D,can%20be%20avoided%20by%20scaling. – Ehsan Sepehri Jul 04 '20 at 08:45
  • 2
    The rounding mode is [implementation defined, and might be modifiable.](https://en.cppreference.com/w/c/numeric/fenv/FE_round) – DevSolar Jul 04 '20 at 08:54
  • 1
    Just to clarify (and in addition to DevSolar’s comment), the rounding mode is mostly irrelevant here, and a red herring; any other rounding mode would show the same behaviour in this case, because the precision of `float` is too small to express the difference between `0.1f + 0.2f` and `0.3f` (even before rounding!). Anyway, there isn’t a “small” chance that rounding will fail: there are **infinitely many** numbers where `float` arithmetic will give you wrong results (same as there are infinitely many cases where limited-precision *decimal* floating point arithmetic will yield wrong results). – Konrad Rudolph Jul 04 '20 at 09:17
  • The naive expectation about 0.x+0.y is true at 84% in double precision and 88% in single precision with default rounding mode. It is around 78% for a fraction made of 2 to 4 (decimal) digits, both in single and double precision. – aka.nice Jul 05 '20 at 07:01
  • @KonradRudolph what you write is correct, but I don't like the term *is too small* because it might let think that expectations work due to small precision. With 1 bit less precision - 23 bit significand - 0.1+0.2 would not be equal to 0.3. 0.2 is obviously on next binade than 0.1, and their sum is on next binade. So the least bit of the exact sum is 1/4 ulp. Depending on rounding mode, that can make an error up to 3/4 ulp (1/2 ulp here at most because 0.2 is 2*0.1). The relation to floating point precision concerns the trailing bits of 0.1 in floating point representation, and it's periodic. – aka.nice Jul 05 '20 at 07:23

3 Answers3

8

I get false for your program, not true as you indicate in your question. (0.3 is a double literal, so my guess is that when you tested this locally, you used a float variable instead of a 0.3 literal.) If you actually use float (== 0.3f instead of == 0.3), you get true as the output because it just so happens that with float, 0.1 + 0.2 == 0.3 is true.

But, the fundamental point remains that the IEEE-754 binary floating-point used by float (single-precision) and double (double-precision) is really fast to calculate and useful for lots of things, but inherently imprecise for some values, and so you get that kind of issue. With float, you get true for 0.1 + 0.2 == 0.3, but you get false for 0.1 + 0.6 == 0.7:

#include <stdio.h>

int main() {
    printf("%s\n", 0.1f + 0.6f == 0.7f ? "true" : "false"); // prints false
    //             ^^^^−−−^^^^−−−−^^^^−−− `float` literals
    return 0;
}

The famous 0.1 + 0.2 == 0.3 version of this issue happens to double:

#include <stdio.h>

int main() {
    printf("%s\n", 0.1 + 0.2 == 0.3 ? "true" : "false"); // prints false
    //             ^^^−−−^^^−−−−^^^−−− `double` literals
    return 0;
}
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • Crowder Thanks for your great answer, yes I was using 0.3 as a float variabe locally, now I see. Btw why float-point number is really fast to calculate? –  Jul 04 '20 at 23:12
  • @slowjams - Because they're designed that way. The [Wikipedia article](https://en.wikipedia.org/wiki/IEEE_754) may have more details. – T.J. Crowder Jul 05 '20 at 08:25
2

In a Math sense, 0.1 + 0.2 == 0.3 is always true

In a Floating point sense, 0.1 + 0.2 == 0.3 is only true if error in representing 0.1 + error in representing 0.2 == error in representing 0.3

As I'm sure your are aware the errors are dependent on the value and its magnitude. As such, there are a few cases where the errors align such that floating point numbers seem to work for equality, but the general case is that such comparisons in general are faulty, because they fail to account for errors.

To write strong floating point code, you need to look into measurement theory, and how to propagate measurement errors throughout your formulas. This also means that you'll have to replace the C type (bit comparison) equality with a "equals within the bounds of error".

Note that you cannot build a system that auto-handles the error in the program perfectly, because to do so would require an exact storage approach of finite size for any fractional number of possibly infinite repeating digits. As a result, error estimation is typically used, and the result is typically compared within an approximation boundary that is tuned for the values involved.

It doesn't take long to realize that while your program is correct, you can't trust the technique because the technique, in general, won't return the right values.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • The error in representing 0.3 is NOT equal to the sum of errors for representing 0.1 and 0.2. One needs another error, the error for representing the sum of the two approximations. The errors for representing 0.1 0.2 and 0.3 in single precision float are respectively 1/(5*2^27) , 2/(5*2^27) and 1/(5*2^24). The error for representing (0.1f+0.2f) is indeed 1/(2^27)... – aka.nice Jul 05 '20 at 05:56
  • A typical illustration is 0.1+0.4==0.5 in double precision, though both operands have representation errors (both by excess), their sum appear to equal 1/2 which has no representation error at all. – aka.nice Jul 05 '20 at 06:18
  • @aka.nice I was not thinking of error in magnitude terms, but in absolute terms of distance from the ideal number. Thus, if one error is positive and the other is negative, and the end result could be had without any error, the positive and negative errors must be equal to achieve the end result. Hope this clarifies my statement, but in general (a + a.error) + (b + b.error) == (c + c.error) when (a + b == c) means that (a.error + b.error = c.error) Otherwise it's not a linear system. – Edwin Buck Jul 05 '20 at 06:28
  • Beware, here you're dealing with floating point addition which may carry its own rounding at the end. Despite 0.1 and 0.4 being both greater than 1/10 and 4/10 respectively, they sum to 0.5 after rounding to nearest floating point... – aka.nice Jul 05 '20 at 20:42
  • The formulation is thus (a + a.error) + (b + b.error) + sum.error == (c + c.error) Forgetting sum.error will lead to missunderstanding. – aka.nice Jul 06 '20 at 07:30
1

Generally, all the float rvalues are declared as double (e.g. 0.5, 11.332, 8.9, etc.) Thus, when you write the following statement:

float a = 0.1;
float b = 0.2;

if(a+b == 0.3)
  printf("true");
else
  printf("false");

It evaluates 0.1 + 0.2 = 0.3, that's okay, but in the right hand side, the 0.3 doesn't works as you're expecting it to be, as already mentioned, it's declared as double by default.

So, the compiler tries to compare the value:

0.3 == 0.2999999999999999889

Which is clearly not equal.

To solve this issue, you need to append a suffix to express the compiler you're trying to use a floating point value.

Try this instead:

(a + b == 0.3F)

The F or f indicates the value is a float.

Unfortunately, you can't do the same stuff with double. To prove that, you can write the following code:

#include <iomanip>
.
.
.
cout << setprecision(20) << 0.1 << endl;
cout << setprecision(20) << 0.2 << endl;
cout << setprecision(20) << 0.3 << endl;
cout << setprecision(20) << (0.1 + 0.2) << endl;

You'll get to know that all of the above values which will be displayed, will have different values:

0.10000000000000000555 // 0.1
0.2000000000000000111  // 0.2
0.2999999999999999889  // 0.3 -------- NOTE
0.30000000000000004441 // 0.1 + 0.2 -- NOTE

Now, compare the values of NOTE. They're also unequal.

Hence, the comparison of 0.2999999999999999889 and 0.30000000000000004441 fails and you get False at any cost.

Rohan Bari
  • 7,482
  • 3
  • 14
  • 34