Addition of very small Double values (Java)

Question

I was wondering, why my simple addition of some double values leads to the following results in Java:

double a = 1 + 1E-10; // 1.0000000001 (as expected)
double b = 1 + 1E-15; // 1.000000000000001 (as expected)
double c = 1 + 1E-20; // 1.0 (why?)

I thought I can at least add a value of Double.MIN_VALUE, which seems to be 4.9E-324.

What am I doing wrong here?

"What am I doing wrong here?" - You're forgetting about the precision problems any floating point number representation has. — Thomas, Apr 21 '17 at 11:56
To be a little bit more technical: in [IEEE 754](https://en.wikipedia.org/wiki/IEEE_floating_point), one has 11 bits of exponent and 53 bits of mantissa. The exponent defines the order of magnitute of the number, while the mantissa defines the numeric value. As an example, in `2.63 x 10^3`, `2.63` is the numeric value, while `3` is the magnitude. If you add two values of significantly different magnitude, you may run into problems wrt. precision, which is what you observed. In general, `float`s and `double`s work well, when (most) components are roughly in the same oder of magnitude. — Turing85, Apr 21 '17 at 12:03
Thanks @Turing85, but I still don't get, why I can describe a very small value using `Double.MIN_VALUE`, but can't use it as expected. :P Is it only because the other value (1 in this case) has a much different magnitude? — Thomas, Apr 21 '17 at 12:16
@Thomas Yes. `Double.MIN_VALUE` is `2^-1074`, while `1` is `2^0`. — Turing85, Apr 21 '17 at 12:21
As an example of how fast you can run into precision problems: 161 * 0.7 should be 112.7. However, the result of `161.0 * 0.7` is `112.69999...` due to those precision problems. — Thomas, Apr 21 '17 at 12:37
@Thomas this is expected. Errors propagate faster when multiplying compared to addition. This is logical since multiplication is in a sense only repeated addition. I do not know the exact error term from the top of my head, but if you look up some numerics 101 lecture, I am sure you will find the error terms quite quickly. — Turing85, Apr 21 '17 at 12:41
Ok, I understood that the number is internally converted to [binary64 double-precision](https://en.wikipedia.org/wiki/Double-precision_floating-point_format). 64 bit: 1 bit for the sign, 53 bits for the mantissa and 11 bits for the exponent. So basically I can describe a number by its magnitude and _shift_ the comma. The 53 bits correspond to ** 53 * log10(2) ** decimal digits (approx 15). This is why I can compute `1+1E-15` but not `1+E-16`, right? But the floating numbers are still a bit confusing for me, because `0.3+0.1=0.4`but `0.3-0.1=0.19999999999999998` . — Thomas, Apr 21 '17 at 13:13
@Thomas if you know how to convert a base 10 representation to binary, then try to write `0.3` in binary. You will notice that this representation has a period (just like 1/3 in base 10), therefore you will have a precision error. Same holds true for `0.1` in binary. When calculating `0.3 + 0.1`, the rounding errors cancel-out and you get the expected result. When calculating `0.3 - 0.1`, the errors do not cancel out and you get an unexpected result. This result, however, is correct in the given arithmetics. — Turing85, Apr 21 '17 at 13:27
Thank you very much for your patient help @Turing85. Unfortunately you have no **answer** I can mark. :P — Thomas, Apr 21 '17 at 13:37
@Thomas as long as you undestood the concept, all is good and I am satisfied :) — Turing85, Apr 21 '17 at 13:47
@Turing85 note that you don't have to mention me (i.e. @Thomas) in order to get the OP notified (he'll be notified anyways) - it's slightly irritating to get comment messages on a discussion I'm not participating in anymore ;) — Thomas, Apr 21 '17 at 14:01

score 3 · Accepted Answer · answered Apr 21 '17 at 12:19

As @Turing85 points out, a double has 11 bits of exponent and 53 bits of mantissa.

What we are calculating here is 1.0 + 1E-20. To represent that number (more precisely than 1.0) we need at least 21 decimal digits of precision or 71 bits. That is more precision than a double provides in the mantissa.

So, the nearest representable double number to 1.0 + 1E-20 is .... 1.0. And that's the result you get.

Welcome to mysterious world of floating point arithmetic.

score 0 · Answer 2 · answered Apr 21 '17 at 12:01

You are not doing anything wrong here. The concept of decimal precision is at hand. There is always a possibility of error in floating point numbers. And the common error value is represented as:

- 1/2E-n <= error <=  1/2E-n

where n is the number of decimal digits you have defined.

More about floating point errors can be found here.

Addition of very small Double values (Java)

2 Answers2

Linked