Floating point inexact result with not-too-small decimal fraction - is there a way to overcome?

Question

I have a piece of code that repetitively decreases a variable which starts at 1. After two iterations, the result is inexact, which came as a surprise.

The effective process can be shown by the following two lines:

>> (1 - 0.1) - 0.9
0
>> (1 - 0.05 - 0.05) - 0.9
-1.110223024625157e-16

The result in the second case is not zero, in both Matlab and Octave.

Similarly, the following C code shows the problem:

#include <stdio.h>

void main () {
  double x1, x2;
  x1=1-0.05-0.05;
  x2=1-0.1;

  printf("x1 exact:%d, x2 exact:%d\n", x1==0.9, x2==0.9);
}

Compiled with gcc version 4.7.0 on Intel Xeon(R) CPU E5-2680, the result is

x1 exact:0, x2 exact:1

show that the first calculation is inexact while the second is exact.

I can overcome this in C by using long double (suffix "L" for all literals, and the variables declared as long double). This leaves me somewhat confused: I assumed the inexact result can be explained by the fact that 0.05 does not have an exact representation in base 2; but using long double does not change this fact... so why is the result different?

What I really want is a way to overcome it in Matlab. Any ideas?

Interestingly, MS Excel result for this same calculation is exact in both cases.

The linked answer above explains how to overcome this: don't assume precision all the way to the last digit, round to fewer digits on output. Using long double literals does exactly this: the result is rounded to a double, so you don't see the imprecision any more. — Cris Luengo, Apr 15 '18 at 16:20
@camelccc I am not asking why the result is inexact, nor trying to print it. I am asking why it becomes exact with **long double** and whether there is a way to overcome this Matlab. — laugh salutes Monica C, Apr 15 '18 at 16:22
it isn't exact in long double. though the extra internal figures will usually cause it to round to what you think is exact when printed. What long double is is compiler dependent anyway. — camelccc, Apr 15 '18 at 16:25
Repeat: I am not printing the result. I am comparing it to literal 0.9 and it is inexact. No printing of the double or long double values - only comparison. Is long double *comparison* compiler dependent? I thought floating point arithmetic is performed in hardware? Editing my question to include compiler information. — laugh salutes Monica C, Apr 15 '18 at 16:33
Select some different values and the result is no longer exact with long double. But it may be "exact" with floats. It all depends at what digit the infinite continuos expansion of the floating point value is rounded at / truncated. — Aki Suihkonen, Apr 15 '18 at 16:46
@laugh: Funny you chose to comment on your perceived meaning of the duplicate proposed by carmelccc, but chose to ignore my further explanation: in the C code with long double constants you are rounding the result of the operation to double precision. This is causing the result to be what you expected. — Cris Luengo, Apr 15 '18 at 17:51
@CrisLuengo: There is no rounding to double in the code. The result is explained by the (very detailed) accepted answer. — laugh salutes Monica C, Apr 16 '18 at 06:59

Steve Summit · Answer 1 · 2018-04-15T16:51:03.897

As you know, decimal numbers like 0.1 and 0.9 do not have an exact representation in binary. So if you do something like this:

float f = 0.1;
if(f * 9 == 0.9)
    printf("exact\n");
else
    printf("inexact\n");

, or if you write code like in the original question, it might print "exact", and it might print "inexact", depending on... all sorts of things. To my mind, it's not worth spending too much time trying to figure out why or why not. If it prints "inexact", well, we know why, it wasn't exact to begin with. If it prints "exact", we got "lucky" -- some inexactitudes somewhere canceled each other out, or something -- but we can't depend on it, so it's not very interesting.

Since we can't count on it, we have to write code that performs appropriate rounding when printing, or uses "close enough" equality comparisons. Once we've written that code, it works properly regardless of whether the exact comparison would have, by whatever chance, seemed to have worked.

Re: “we can’t depend on it”: That depends on circumstances. IEEE-754 arithmetic is reproducible. If one merely wants to use arithmetic to approximate physics, they may have little need to attend to the details. But there are software engineering needs for attending to the details, so this is not good advice in general. Sometimes a formula needs a square root, so one must consider what happens to their computations when the argument of the square root is near zero—will the rounding errors push it into negative territory? Sometimes there are strong reasons for analyzing floating-point behavior. — Eric Postpischil, Apr 15 '18 at 18:24

Eric Postpischil · Accepted Answer · 2018-04-15T18:16:35.163

… the result is … show that the first calculation is inexact while the second is exact.

This inference is incorrect. The fact that 1 - .05 - .05 == .9 is false and 1 - .1 == .9 is true shows that the rounding errors that occur in computing 1 - .05 - .05 do not produce a result that equals .9, but that the rounding errors that occur in computing 1 - .1 do produce a result that equals .9. Since the floating-point result of using .9 in source code it itself not .9, the fact that 1 - .1 equals .9 does not mean that 1 - .1 is exact. It just means it got the same rounding error as .9.

In the expression 1 - .1 == .9, there are three rounding errors:

When .1 is converted from a decimal numeral in the source code to a binary floating-point value, the result is not .1 but is 0.1000000000000000055511151231257827021181583404541015625.
When .9 is converted from a decimal numeral in the source code to a binary floating-point value, the result is 0.90000000000000002220446049250313080847263336181640625.
When the former number, 0.1000000000000000055511151231257827021181583404541015625, is subtracted from 1, we can see that, since it is greater than .1, the result ought to be under .9, something like .8999…. But the result is 0.90000000000000002220446049250313080847263336181640625.

It just so happens that the rounding errors produced the same result on both sides, so the test for equality evaluated to true.

In computing 1 - .05 - .05, the rounding errors are:

.05 in the source code is converted to 0.05000000000000000277555756156289135105907917022705078125.
1 - .05 has the result 0.9499999999999999555910790149937383830547332763671875.
1 - .05 - .05 has the result 0.899999999999999911182158029987476766109466552734375`.

What has happened here is that, in subtracting 1 - .05, there happened to be a nearby representable value that was slightly less than .95. Thus, when subtracting .05, which we see is slightly greater than .05, we are subtracting something a bit greater than .05 from something a bit less than .95. This combination of errors produces a mathematical result enough under .9 that it is closer to the next lower representable value, 0.899999999999999911182158029987476766109466552734375, than it is to 0.90000000000000002220446049250313080847263336181640625.

When you do similar calculations with long double, the rounding errors happen to be different. When examining floating-point calculations in these ways, long double will not be better than double in the frequency with which computed results happen to land on the same results you would get with ideal mathematics. It may have happened in your example with the numbers you chose, but using other numbers would produce effectively random results, somewhat as if each computation had a random error up or down one step. If those errors happen to have the same net effect in two different computations, then the two results are equal. If those errors happen to have different net effect, then the two results are not equal.

Floating point inexact result with not-too-small decimal fraction - is there a way to overcome?

2 Answers2