floating point number precision

Question

I watched a video https://www.youtube.com/watch?v=PZRI1IfStY0 by computerphile and try to understand the concept of floating point imprecision. I understand that 0.1 cannot be precisely expressed in binary form. I try to do experiments on my own and I use a float type variable to store a number 4.2.

These are the codes:

#include <stdio.h>

int main(void)
{
    float m = 4.2;
    printf("%f\n", m * 1);
    printf("%f\n", m * 10);
    printf("%f\n", m * 100);
    printf("%f\n", m * 1000);
}

the outputs are:

4.200000
42.000000
419.999969
4200.000000

why is it only when 4.2 is multiplied by 100 inaccurate?

Try `printf("%a\n", ...)` to "see" the individual bits of the number — pmg, Sep 13 '20 at 10:07
*The* canonical floating point problem question: [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken/588014). Together with [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). These are basically the two major resources for learning about floating point precision and the problems with it. I also suggest you change the type of `m` to be a `double` and see how that affect the result. — Some programmer dude, Sep 13 '20 at 10:07
Actually _all_ the calculations or inaccurate, or at least imprecise. Only 4.2 * 100 displays a kind of inaccuracy that is capable of being rendered by `printf()` after all the various conversions have been applied. — Kevin Boone, Sep 13 '20 at 10:29
I would only recommend *What every computer scientist ...* to people developing f-p arithmetic libraries and suchlike. For most programmers *using* the f-p capabilities of their platforms *What Every Programmer Should Know About Floating-Point Arithmetic* (https://floating-point-gui.de) is a much better place to start learning. — High Performance Mark, Sep 13 '20 at 12:05
What Every Programmer Should Know About Floating-Point Arithmetic, this is a big topic to read and understand. Thank you guys, will definitely check it out. — heihei, Sep 13 '20 at 12:54

Eric Postpischil · Accepted Answer · 2020-09-13T12:02:07.963

First, when converting with %f, printf rounds the number to six digits after the decimal point. To see the full values in this case, you can use %.20f:

#include <stdio.h>


int main(void)
{
    float m = 4.2;
    printf("%.20f\n", m * 1);
    printf("%.20f\n", m * 10);
    printf("%.20f\n", m * 100);
    printf("%.20f\n", m * 1000);
}

Output:

4.19999980926513671875
42.00000000000000000000
419.99996948242187500000
4200.00000000000000000000

To understand what is happening, consider the actual value of m. In your C implementation, float is implemented using 24 bits for the significand (the fraction portion of the floating-point number). This is often described as being 24 binary digits with a radix point (the general version of a decimal point) after the first bit, such as 1.00001100110011001100110₂. With the sign and exponent, the floating-point form would be +1.00001100110011001100110₂•2².

However, we can also scale the significand to be an integer less than 2²⁴, by adjusting the exponent correspondingly. +1.00001100110011001100110₂•2² = +100001100110011001100110₂•2⁻²¹. In decimal, 100001100110011001100110₂ is 8,808,038, and 2⁻²¹ is 1/2,097,152. And 8,808,038 / 2,097,152 = 4.19999980926513671875. This representation using integers less than 2²⁴ is mathematically equivalent to the form with the radix point, but it lets us see some of the rounding effects more easily, as we will see below.

When we multiply by 10 using ordinary real-number arithmetic, the result would be 88,080,380 / 2,097,152 = 88,080,380 / 2²¹. However, that numerator does not fit in the 24 bits of the float format your C implementation uses. We have to adjust to bring it under 2²⁴ = 16,777,216. Adjustments are made by adjusting the exponent, which multiplies or divides the significand by powers of two. We can adjust the exponent by three and divide the numerator by 2³, which gives 11,010,047.5 / 2¹⁸. But now the numerator is not an integer. To fit it into the format, it is rounded to the nearest integer. 11,010,047 and 11,010,048 are equally far from 11,010,047.5. The rule for ties is to use the choice with the even low digit, so 11,010,048 is used.

So the result of m * 10 is 11,010,048 / 2¹⁸ = 11,010,048 / 262,144 = 42.

Now consider multiplying by 100. The real-number result would be 880,803,800 / 2²¹. To get the numerator under 16,777,216, we adjust the exponent by 6, dividing the numerator by 64. The result is 13,762,559.375 / 2¹⁵. Again we round the numerator to an integer, giving 13,762,559 / 2¹⁵. Observe in this case we happened to round down instead of rounding up. It so happens that the fraction landed under ½, so we rounded down. 13,762,559 / 2¹⁵ = 13,762,559 / 32,768 = 419.999969482421875.

What is happening here is that multiplying by various powers of 10—1, 10, 100, 1000 (in binary: 1, 1010₂, 1100100₂, 1111101000₂)—produces various results in those fractions. Since we started with a number just under 4.2 (4.19999980926513671875), when there is a rounding up, the result reaches a multiple of 4.2. When there is a rounding down, it does not.

It means that the computer never store exactly 4.2 in m for me, it is storing +1.000011001100110011001102•2^2 = 4.19999980926513671875. and using this number for multiplication is not just simply shifting the radix point. It involves scaling the significand to be an integer, and it cannot be greater than 2^24, we have to scale it so it fits the 24bits in float, by adjusting the exponent, while we adjusting the exponent, it is possible that the numerator is not an integer, and it has to rounded to the nearest integer, this is where the imprecision comes. Am I correct? — heihei, Sep 13 '20 at 13:24
@EricPostpischill Thank you so much. You really clear my misconception in floating point imprecision — heihei, Sep 13 '20 at 13:27
@heihei: Yes, pretty much. The multiplication is not just shifting the radix point because the radix used in the computer is 2, but the multiplication is by a power of 10. So the significand has to be multiplied. And the result always has to fit in 24 bits. (It does not matter if you position those 24 bits to be from 2^23 to 2^0, making them an integer, or from 2^0 to 2^−23, making them 1.something; the math works out the same with the adjusted exponent. It is just they have to start and stop within some 24 consecutive bits.) — Eric Postpischil, Sep 13 '20 at 13:56

floating point number precision

1 Answers1