2

I already know how floating point numbers stored in memory and I understand why expression 0.1 + 0.2 != 0.3 is True.

But I don't understand why 0.2f + 0.3f == 0.5f is true. Here is my code:

cout << setprecision(64)
    << "0.3 = " << 0.3 << "\n"
    << "0.2 = " << 0.2 << "\n"
    << "0.2 + 0.3 = " << 0.2 + 0.3 << "\n"
    << "0.3f = " << 0.3f << "\n"
    << "0.2f = " << 0.2f << "\n"
    << "0.2f + 0.3f = " << 0.2f + 0.3f << "\n";

I get output:

0.3 = 0.299999999999999988897769753748434595763683319091796875
0.2 = 0.200000000000000011102230246251565404236316680908203125
0.2 + 0.3 = 0.5
0.3f = 0.300000011920928955078125
0.2f = 0.20000000298023223876953125
0.2f + 0.3f = 0.5

I agree that if we sum 0.3 + 0.2 with double types a result will be 0.5, because 0.299999999999999988897769753748434595763683319091796875 + 0.200000000000000011102230246251565404236316680908203125 = 0.5.

But I still don't understand why sum 0.2f + 0.3f is 0.5 too. I expect the result will be 0.50000001490116119384765625 (0.300000011920928955078125 + 0.20000000298023223876953125). Could you please help me understand where I'm wrong?

Alim Abaev
  • 31
  • 2
  • 1
    *Could you please help me understand where I'm wrong?* -- Using output statements to determine the actual values of numbers. You should be subtracting the values, and use an `if()` statement or similar to determine if the absolute value of the subtraction yields 0. – PaulMcKenzie Jan 07 '22 at 19:35
  • Probably dumb luck. `float` might actually be able to get the "correct" result for that addition even though `double` can't. – user4581301 Jan 07 '22 at 19:35
  • 3
    `0.5` is a binary fraction (`1/2`) so is easy to represent exactly using a binary representation. I don't think `0.3` (`3/10`) is a binary fraction. – François Andrieux Jan 07 '22 at 19:35
  • `if ( 0.2 + 0.3 == 0.5 )` -- Where do you do that in the code? Nowhere that I see. That is the proper way to show that floating point is "broken", and not by using formatted output statements. – PaulMcKenzie Jan 07 '22 at 19:39
  • 1
    Computerphile/tom scott made a nice video on floating points : https://www.youtube.com/watch?v=PZRI1IfStY0 might help you understand things a bit better too – Pepijn Kramer Jan 07 '22 at 20:02
  • 1
    Try printing out the binary representation of the numbers (which will be exact) rather than the decimal representation (which may introduce noise). – Eljay Jan 07 '22 at 21:00
  • 3
    You're correct that the infinite precision result of `0.2f + 0.3f` would be `0.50000001490116119384765625`. However, that is not exactly representable as a float, and the next float after `0.5` is `0.500000059604644775390625`. The value is 1/4 of the way between 0.5 and the next possible float (so it rounds to 0.5). Thich is an expected error, since 2**-3 < 0.2 < 2**-2 < 0.3 < 2**-1 < 0.5000000149..., the value of 0.2 has a mantissa 2 lower, so it has 4 times the precision. – Artyer Jan 07 '22 at 21:40
  • @eerorika: This question asks a specific question about floating-point arithmetic and is not answered by that question. Please do not promiscuously close floating-point questions as duplicates of that question. – Eric Postpischil Jan 07 '22 at 21:50
  • @Eljay: The decimal representations shown in the question are exact. – Eric Postpischil Jan 07 '22 at 21:58
  • 1
    @user4581301: Using IEEE-754 binary64 for `double`, `.2 + .3` does yield .5. Further, I think the fact that .2 and .3 are complementary with respect to a power of two means that `.2 + .3 == .5` will be true in any binary-based floating-point format with round-to-nearest-ties-to-even, regardless of the number of bits of precision, essentially because the rounding of .3 to the format will be complementary to the low bits of the result of rounding .2 to the format. – Eric Postpischil Jan 07 '22 at 22:05
  • I misread the question and thought somehow `double` gave the Asker's expected and the `float` didn't. – user4581301 Jan 07 '22 at 22:15
  • @Eljay: Your comment says the binary representations will be exact. The decimal representations shown in the question are, in fact, exact representations of the `float` values. – Eric Postpischil Jan 07 '22 at 22:32
  • @Eljay: The decimal numbers shown are exact representations. They are not different numbers. 0.20000000298023223876953125 and 13,421,773•2^−26 are the same number. – Eric Postpischil Jan 07 '22 at 23:29
  • @Eljay: Also saying `0.200000000001f` may have the same representation “held by” its `float` but be a different number is nonsense or a misuse of terminology. `0.200000000001f` is source text that is a `float` literal; it is not a number. If we instead consider the number 0.200000000001, then it has no representation in `float` (when that is IEEE-754 binary32). We can convert 0.200000000001 to `float` with rounding, but we cannot represent it in `float`. – Eric Postpischil Jan 07 '22 at 23:30
  • @Eljay: `0.2f` is not a number. It is source code. In source code, it is a `float` literal. Its value is the result of converting the number 0.2 to the `float` representation, usually with rounding-to-nearest, although the C standard permits some latitude on this. The result is a number that is **exact**. It is not an approximation; with IEEE-754 binary32 and round-to-nearest, the number is 0.20000000298023223876953125. It would be a misstatement to say this is a representation of .2; the result of the conversion to `float` is not defined by the IEEE-754 or C standards to represent .2. – Eric Postpischil Jan 08 '22 at 00:21
  • @Eljay: The trailing digits in 0.20000000298023223876953125 are not noise; they display the exact value of the number. This meaning for floating-point representations is well established in both the IEEE-754 standard and the C standard. “Precision” of a floating-point format refers to the number of digits in the significand, not the accuracy with which a number is represented. As defined in the standards, a floating-point datum that is not an infinity or a NaN represents one number **exactly**; it has perfect accuracy in that sense. – Eric Postpischil Jan 08 '22 at 00:23
  • @Eljay: In floating-point, it is the **operations** that have approximations, not the **numbers**. Again, this is well defined in the IEEE-754 and C standards. When an operation is performed, the result is as if the operation were performed with real-number arithmetic and then rounded to the nearest representable value. Thus any operation, such as `a+b` or `a*b`, produces an approximation of the real-number result. However, the result, once obtained, represents a single specific number exactly; that number is not itself an approximation in the floating-point semantics. – Eric Postpischil Jan 08 '22 at 00:25
  • @Eljay: This model, in which the operations approximate real-number arithmetic but the numbers are exact, is crucial to analyzing, designing, debugging, and proving floating-point software. – Eric Postpischil Jan 08 '22 at 00:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240850/discussion-between-eljay-and-eric-postpischil). – Eljay Jan 08 '22 at 01:23
  • 1
    adygha, "I expect the result will be 0.50000001490116119384765625" --> True that is the sum, yet that value is not representable as a `float`, so the answer rounds to a nearby `float`. The closest `float` is 0.5. – chux - Reinstate Monica Jan 08 '22 at 06:36

1 Answers1

4

The basic reason is that although .2f is a little above .2 and .3f is a little above .3, the sum of the excesses is less than halfway from .5 to the next representable float number.

First, let’s note of the scales used for these numbers. Using the IEEE-754 binary32 format, the step between numbers in [1, 2) is 2−23. Each representable number in this interval is an integer multiple of 2−23.

.3 in is [¼, ½), where the step is 2−25.

.2 in is [⅛, ¼), where the step is 2−26.

The literal 0.2f is .2 converted to float. This produces 13,421,773•2−26, which equals 0.20000000298023223876953125. For 0.3f, we get 10,066,330•2−25, which is 0.300000011920928955078125.

Let’s convert those scales to the scale used for numbers in [½, 1), where the step is 2−24. 13,421,773•2−26 becomes 3,355,443.25•2−24, and 10,066,330•2−25 becomes 5,033,165•2−24. Adding those produces 8,388,608.25•2−24. To get a representable result, we round that to the nearest integer. As you can see, the fraction is .25, so we round down, yielding 8,388,608•2−24, which is .5. The next representable number, 8,388,609•2−24, which is 0.500000059604644775390625, is further away.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312