sum of double numbers in c++

Question

I want to calculate the sum of three double numbers and I expect to get 1.

double a=0.0132;
double b=0.9581;
double c=0.0287;
cout << "sum= "<< a+b+c <<endl;
if (a+b+c != 1)
cout << "error" << endl;

The sum is equal to 1 but I still get the error! I also tried:

cout<< a+b+c-1

and it gives me -1.11022e-16

I could fix the problem by changing the code to

if (a+b+c-1 > 0.00001) cout << "error" << endl;

and it works (no error). How can a negative number be greater than a positive number and why the numbers don't add up to 1? Maybe it is something basic with summation and under/overflow but I really appreciate your help. Thanks

I'll let you do the research. What is `0.0132` in binary? `0.9581` in binary? etc.? The answer to that is the reason why you do not get the exact answer. Those numbers cannot be represented exactly in binary, and binary is what the computer is using. [See this](http://floating-point-gui.de/) — PaulMcKenzie, May 10 '16 at 17:59
[What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) — Dimitri Podborski, May 10 '16 at 18:02
The issue is "floating point precision" (or, in this case, IMprecision ;)): Look [here](http://stackoverflow.com/questions/7011184/floating-point-comparison) or [here](http://stackoverflow.com/questions/2100490/floating-point-inaccuracy-examples). — paulsm4, May 10 '16 at 18:02
Thank you guys for your quick responses. I read the references and understood what my problem is. — g1368, May 10 '16 at 18:28
While this problem typically gets labelled "floating point precision", it's not limited to floating point. `int i = 1/3; i = 3 * i; std::cout << i << '\n';` will display 0, not 1, and nobody except the newest newbie is surprised by this.. The difference is that programmers learn early on how to deal with limited precision in integer types, but rarely learn it for floating-point types. — Pete Becker, May 10 '16 at 18:48

Trevor Hickey · Accepted Answer · 2016-05-10T18:11:27.177

3

Rational numbers are infinitely precise. Computers are finite.
Precision loss is a well known problem in computer programming.
The real question is, how can you remedy it?

Consider using an approximation function when comparing floats for equality.

#include <iostream>
#include <cmath>
#include <limits>
using namespace std;

template <typename T>
bool ApproximatelyEqual(const T dX, const T dY)
{
    return std::abs(dX - dY) <= std::max(std::abs(dX), std::abs(dY))
    * std::numeric_limits<T>::epsilon();
}

int main() {
    double a=0.0132;
    double b=0.9581;
    double c=0.0287;

    //Evaluates to true and does not print error.
    if (!ApproximatelyEqual(a+b+c,1.0)) cout << "error" << endl;
}

edited May 10 '16 at 18:11

answered May 10 '16 at 18:04

Trevor Hickey

36,288
32
162
271

1

Rational numbers can be implemented without loss of precision as fractions. For real (ie. including irrational) numbers this isn´t true. – 463035818_is_not_an_ai May 10 '16 at 18:16
@tobi303 Of course you can get around precision loss with custom data types. Your fraction type though would actually just be two integer types with some special logic. I was referring to the fundamental data types of C++. – Trevor Hickey May 10 '16 at 20:06
I just had a bit of trouble with "Rational numbers are infinitely precise" because rational numbers are rather boring, there are only countable many of them (infinite but still countable) while if you want "infinte precision" you need real numbers (that cannot be represented precisely on a machine, not even with a custom data type) – 463035818_is_not_an_ai May 10 '16 at 21:58
@tobi303 Oh, I see. Thanks, that makes sense. – Trevor Hickey May 10 '16 at 22:29

Frank Puffer · Answer 2 · 2016-05-10T18:22:05.733

Floating point numbers in C++ have a binary representation. This means that most numbers that can exactly represented by a decimal fraction with only a few digits cannot be exactly represented by floating point numbers. That's where your error comes from.

One example: 0.1 (decimal) is a periodic fraction in binary:

0.000110011001100110011001100...

Therefore it cannot be exactly be represented with any number of bits with binary encoding.

In order to avoid this type of error, you can use BCD (binary coded decimal) numbers which are supported by some special libraries. The drawbacks are slower calculation speed (not directly supported by the CPU) and slightly higher memory usage.

ANother option is to represent the number by a general fraction and store numerator and denomiator as separate integers.

Another option is to do fixed point (scaled) arithmetic. – stark May 10 '16 at 19:42 — stark, May 10 '16 at 19:42

sum of double numbers in c++

2 Answers2