0

I'm a bit of C newbie but this problem is really confusing me.

I have a variable double = 436553940.0000000000 (it was cast from an Int) and an other variable double 0.095832496.

My result should be 436553940.0958324*96*, however I get 436553940.0958324*67*.

Why does this happen and how can I prevent it from happening?

user3529398
  • 11
  • 1
  • 4
  • 2
    Floating point numbers cannot hold all rational numbers. See [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – Zeta Apr 13 '14 at 16:37
  • 2
    Amusing that such often asked questions merit multiple upvotes. – devnull Apr 13 '14 at 16:38
  • you are experiencing the limit/s of the IEEE 754 representation, unfortunately, this is a necessary evil since we don't have a better way to represent floating point values with 0/1 bits. Same thing goes for the `float` type for example, and you will get the same behaviour on any language and platforms that adopts the IEEE 754 standard too. there is nothing you can do about it, but you can switch to dedicated libraries if you willing to sacrifice more resources for your computations. – user2485710 Apr 13 '14 at 16:41
  • Ok, I get it. Would it help if I concatenate the two numbers and then store it in the double variable? – user3529398 Apr 13 '14 at 16:50
  • No, concatenating would not change much. The closest double to 436553940.095832496 is 436553940.09583246707916259765625. – Patricia Shanahan Apr 14 '14 at 05:28

2 Answers2

0

The number you expect is simply not representable by a double. The value you receive is instead a close approximation based on rounding results:

In [9]: 436553940.095832496
Out[9]: 436553940.09583247

In [18]: 436553940.095832496+2e-8
Out[18]: 436553940.09583247

In [19]: 436553940.095832496+3e-8
Out[19]: 436553940.0958325

In [20]: 436553940.095832496-2e-8
Out[20]: 436553940.09583247

In [21]: 436553940.095832496-3e-8
Out[21]: 436553940.0958324

You've just run out of significand bits.

Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
0

Doubles are not able to represent every number. We can write some C++ code (that implements doubles in the same way) to show this.

#include <cstdio>
#include <cmath>

int main() {
    double x = 436553940;
    double y = 0.095832496;

    double sum = x + y;

    printf("prev: %50.50lf\n", std::nextafter(sum, 0));
    printf("sum:  %50.50lf\n", sum);
    printf("next: %50.50lf\n", std::nextafter(sum, 500000000));
}

This code computes the sum of the two numbers you are talking about, and stores it as sum. We then compute the next representable double before that number, and after that number.

Here's the output:

[11:43am][wlynch@watermelon /tmp] ./foo
prev: 436553940.09583240747451782226562500000000000000000000000000
sum:  436553940.09583246707916259765625000000000000000000000000000
next: 436553940.09583252668380737304687500000000000000000000000000

So, we are not able to have the calculation equal 436553940.0958324_96_, because that number is not a valid double. So the IEEE-754 standard (and your compiler) defines some rules that tell us how the number should be rounded, to reach the nearest representable double.

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173