C: imprecision in arithmetic of double

Question

I have the following C code:

int main()
{
    double a=.1,b=.2,c=.3,d=.4;
    double e=a+b+c;
    if (d- (1-e)){printf("not equal\n");}
    printf("%.20f\n",d-(1-e));
}

And the result I get is:

not equal
0.00000000000000011102

I know this is due to the imprecision induced from the way computer saves a double. Is there a way to solve this, and make d-(1-e) equal to 0?

I think best way to do it is to define a `e` such that all values less than `e` will be treated as 0 — Rohith R, Sep 14 '16 at 02:43
It's not clear what you mean by *"solve this"*. The only way to "solve" the imprecision of floating point math is not to use `float` or `double` in your code. Which means that you need to find a good math library that doesn't internally convert to a binary representation. — user3386109, Sep 14 '16 at 02:53
Most of the time the answer will be "no". The time and effort required to achieve exact representation of decimal numbers is not worth it, and the code you're trying to write should be using integers instead. On the occasion that the answer is "yes", the solution is to use a fixed-point or arbitrary precision number library. Note that these libraries are very slow compared to using floating point numbers. — David, Sep 14 '16 at 02:56
Possible duplicate of [Is floating point math broken?](http://stackoverflow.com/questions/588004/is-floating-point-math-broken) — too honest for this site, Sep 14 '16 at 03:08
@Olaf Not a duplicate, OP explains how they already know that floating point math behaves this way — M.M, Sep 14 '16 at 03:18
@M.M: Hmm... I still think this is a dup. The answers to the dup cover the problem very well (especially the standard link "what every ..."). The rest is actually a mathematical problem, not a programming one. Anyway, I'll leave the vote, let's see if someone agrees. — too honest for this site, Sep 14 '16 at 03:24
@PRP no need to define it, there are already [`FLT_EPSILON`, `DBL_EPSILON` and `LDBL_EPSILON`](http://stackoverflow.com/q/16063820/995714) — phuclv, Sep 14 '16 at 03:28

score 1 · Accepted Answer · answered Sep 14 '16 at 03:04

1

As PRP correctly suggests: you need to set a small number to use as zero. The standard C-library (Annex F in the C-standard) offers some macros in float.h for that purpose. You can use them like e.g.:

#include <stdio.h>
#include <stdlib.h>
#include <float.h>
#include <math.h>

int main()
{
  double a = .1, b = .2, c = .3, d = .4;
  double e = a + b + c;
  if (d - (1 - e)) {
    printf("not equal\n");
  }
  printf("%.20f\n", d - (1 - e));
  printf("%.20f\n", DBL_EPSILON);
  if (fabs(d - (1 - e)) <= DBL_EPSILON) {
    printf("equal\n");
  }
  exit(EXIT_SUCCESS);
}

answered Sep 14 '16 at 03:04

deamentiaemundi

5,502
2
12
20

I'm new to C language. Is this how people handle float equality test in most cases? And what is DBL_EPSILON, some constant from float.h? – Lii Sep 14 '16 at 03:09
The standard does not require IEC 60559 floating point arithmetic. – too honest for this site Sep 14 '16 at 03:09
Mine says "normative" but it also offers a macro to check for compliance, so: *insert obligatory rant about committees here* and lets go on with our lives – deamentiaemundi Sep 14 '16 at 03:18
The DBL_EPSILON will not work if you are subtracting large numbers, e,g. if a, b, c, and d are multiplied by 1e20. – Rishikesh Raje Sep 14 '16 at 06:17
@RishikeshRaje yes, that's correct, you have to do the obligatory checks. And those include a lot more than just the magnitude and I wanted to keep it simple. It might be a good idea to do an example for the Documentation here, so why don't you do so? – deamentiaemundi Sep 14 '16 at 13:55

Luis Colorado · Answer 2 · 2016-09-15T07:35:28.403

The problem here is that .1, .2 and .3 are not finite digit numbers in base 2, but are periodic in their binary representation.

I'll try to illustrate: Let's suppose we are trying to use base 3 numbering system, and try to sum .1 (which is one third) + .1 (another third) and + .1 (which is another third), but instead of doing that in base 3, let's do it in base 10.

.1 in base 3, converts to base ten as .3333333333333333 (with 16 base ten digits), and if you try to add it three times, you'll get .9999999999999999 and not 1.0.

There's no possibility here to solve to this problem, but cutting the precission on the result. If we have 10 digits, and try to add .3333333333 three times... we'll get .9999999999 (with ten digits precission) the solution here is to cut the result to nine digits and round it (in this case 1.000000000) but if you substract this number to the correct one (1.000000000 - 0.9999999999) you'll get again 0.0000000001.

This is common in floating point arithmethic and the problem is that floating point numbers are discrete in nature, and not continous as real numbers are.

As has been pointed, the <limits.h> header file has constants to deal with this problem, but you have to use them with care, as they can drive you to new errors:

DBL_EPSILON is the floating point constant that results of moving from 1.0 to the next number different than 1.0 and has no floating point number between both. In our case it should be, as we are using base 10 numbers and 10 digits, the number should be (1.000000001 - 1.000000000 => 0.00000001) As the numeration base is different than 10, this number is not a round number, as in base 10.

This number is relative to 1, so if you are substracting two numbers, your epsilon should be relative to the greatest of those numbers

if (fabs(actual - correct) < DBL_EPSILON*fabs(correct)) {
    /* consider both results equal */
}

but in real code, as you normally use several sums and substractions, you'll accumulate all those rounding errors, it's easy to get outside of this tolerance, you can use 2.0 or 10.0 times this value.

The floating point numbering system is better described in IEEE-754 specification. You will find there that DBL_EPSILON is not the only constant you have to consider to make two floating points equal (as you approach 0.0 numbers are more together and as you use larger numbers, they are more sparse.)

C: imprecision in arithmetic of double

2 Answers2