154

I know the 0.1 decimal number cannot be represented exactly with a finite binary number (explanation), so double n = 0.1 will lose some precision and will not be exactly 0.1. On the other hand 0.5 can be represented exactly because it is 0.5 = 1/2 = 0.1b.

Having said that it is understandable that adding 0.1 three times will not give exactly 0.3 so the following code prints false:

double sum = 0, d = 0.1;
for (int i = 0; i < 3; i++)
    sum += d;
System.out.println(sum == 0.3); // Prints false, OK

But then how is it that adding 0.1 five times will give exactly 0.5? The following code prints true:

double sum = 0, d = 0.1;
for (int i = 0; i < 5; i++)
    sum += d;
System.out.println(sum == 0.5); // Prints true, WHY?

If 0.1 cannot be represented exactly, how is it that adding it 5 times gives exactly 0.5 which can be represented precisely?

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
icza
  • 389,944
  • 63
  • 907
  • 827
  • 1
    @MarkoTopolnik it is not obvious, care to explain bit-wise? – EpicPandaForce Sep 30 '14 at 12:05
  • related : http://stackoverflow.com/questions/15575669/adding-1-3-in-java-results-in-1-0-while-it-shouldnt – dinesh707 Sep 30 '14 at 12:05
  • 1
    @MarkoTopolnik If we use less bits (finite instead of infinite), the representation of `0.1` is less than the actual `0.1`. Adding it 5 times should _cumulate_ the error, not _cancel_ it. Also adding 5 times (5 is odd not even) could not cancel it... – icza Sep 30 '14 at 12:06
  • 7
    If you really research it I'm sure you can figure it out, but floating point is loaded with "surprises", and sometimes it's better to just look on in wonder. – Hot Licks Sep 30 '14 at 12:11
  • @icza The error from representing 0.1 cancels out with the error from adding the numbers. This is "obvious" beause it follows *post hoc* from common-sense reasoning about the results. – Marko Topolnik Sep 30 '14 at 12:12
  • 1
    I don't know if Java compilers are permitted to do the same tricks that C compilers are, but I'd at least consider the possibility that the summary loop has been optimized away. – Russell Borogove Sep 30 '14 at 16:50
  • Also related to http://stackoverflow.com/questions/21690585/is-3xx-always-exact – aka.nice Sep 30 '14 at 19:12
  • Agree with Russell. Try it with compiler optimizations disabled and see what happens. – Jamie Hanrahan Sep 30 '14 at 21:50
  • @RussellBorogove No, this is not the case. If I put other code in the loop which adds `0.1` (e.g. `System.out.println()`), that code gets executed so the loop is **not** optimized out. – icza Oct 01 '14 at 05:37
  • 3
    You are thinking about this in a mathy way. Floating point aritmetics is not math in any way. – Jakob Oct 01 '14 at 05:57
  • 13
    @HotLicks that is *very* much the wrong attitude to have. – hobbs Oct 02 '14 at 07:12
  • 2
    @RussellBorogove even if it was optimized away, it would only be a valid optimization if `sum` had the same final value as if the loop was truly executed. In the C++ standard this is called the "as-if rule" or "same observable behavior". – hobbs Oct 02 '14 at 07:42
  • 7
    @Jakob not true at all. Floating-point arithmetic is rigorously defined, with good mathematical treatment of error bounds and such. It's just that many programmers either aren't willing to follow through on the analysis, or they mistakenly believe that "floating-point is inexact" is all there is to know and that analysis isn't worth bothering with. – hobbs Oct 02 '14 at 07:49
  • 1
    @hobbs -- **IEEE floating point** is pretty rigorously defined, others not so much. And while the exact results from a series of IEEE float computations can be predicted, that's not a lot of help in many situations, since "prediction" basically implies doing the computation by hand rather than letting the computer do it. It's much simpler and less error prone (and quicker/cheaper) to just understand that floating point is inexact and save the detailed analysis for those very few cases where it's really necessary. – Hot Licks Oct 02 '14 at 11:49

3 Answers3

156

The rounding error is not random and the way it is implemented it attempts to minimise the error. This means that sometimes the error is not visible, or there is not error.

For example 0.1 is not exactly 0.1 i.e. new BigDecimal("0.1") < new BigDecimal(0.1) but 0.5 is exactly 1.0/2

This program shows you the true values involved.

BigDecimal _0_1 = new BigDecimal(0.1);
BigDecimal x = _0_1;
for(int i = 1; i <= 10; i ++) {
    System.out.println(i+" x 0.1 is "+x+", as double "+x.doubleValue());
    x = x.add(_0_1);
}

prints

0.1000000000000000055511151231257827021181583404541015625, as double 0.1
0.2000000000000000111022302462515654042363166809082031250, as double 0.2
0.3000000000000000166533453693773481063544750213623046875, as double 0.30000000000000004
0.4000000000000000222044604925031308084726333618164062500, as double 0.4
0.5000000000000000277555756156289135105907917022705078125, as double 0.5
0.6000000000000000333066907387546962127089500427246093750, as double 0.6000000000000001
0.7000000000000000388578058618804789148271083831787109375, as double 0.7000000000000001
0.8000000000000000444089209850062616169452667236328125000, as double 0.8
0.9000000000000000499600361081320443190634250640869140625, as double 0.9
1.0000000000000000555111512312578270211815834045410156250, as double 1.0

Note: that 0.3 is slightly off, but when you get to 0.4 the bits have to shift down one to fit into the 53-bit limit and the error is discarded. Again, an error creeps back in for 0.6 and 0.7 but for 0.8 to 1.0 the error is discarded.

Adding it 5 times should cumulate the error, not cancel it.

The reason there is an error is due to limited precision. i.e 53-bits. This means that as the number uses more bits as it get larger, bits have to be dropped off the end. This causes rounding which in this case is in your favour.
You can get the opposite effect when getting a smaller number e.g. 0.1-0.0999 => 1.0000000000000286E-4 and you see more error than before.

An example of this is why in Java 6 Why does Math.round(0.49999999999999994) return 1 In this case the loss of a bit in calculation results in a big difference to the answer.

Community
  • 1
  • 1
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    Where is this implemented? – EpicPandaForce Sep 30 '14 at 12:08
  • 16
    @Zhuinden The CPU follows the IEEE-754 standard. Java gives you access to the underlying CPU instructions and doesn't get involved. http://en.wikipedia.org/wiki/IEEE_floating_point – Peter Lawrey Sep 30 '14 at 12:10
  • 10
    @PeterLawrey: Not necessarily the CPU. On a machine without floating point in the CPU (and no separate FPU in use), IEEE arithmetic will be performed by software. And if the host CPU has floating point but it does not conform to IEEE requirements, I think a Java implementation for that CPU would be obligated to use soft float too... – R.. GitHub STOP HELPING ICE Sep 30 '14 at 20:29
  • 1
    @R.. in which case I don't know what would happen if you used `strictfp` Time to consider fixed point integers I think. (or BigDecimal) – Peter Lawrey Sep 30 '14 at 20:32
  • @PeterLawrey so in plain english this is a rounding thing? – Eugene Oct 01 '14 at 06:17
  • 2
    @eugene the key problem is the limited values floating point can represent. This limitation can result in a loss of information and as the number grows a loss of error. It uses rounding but in this case, rounds down so what would have been a number which is slightly too large as 0.1 is slightly too large, turns into the correct value. Exactly 0.5 – Peter Lawrey Oct 01 '14 at 06:56
  • @PeterLawrey that was my point.. thx for the beautiful answer btw. +1 – Eugene Oct 01 '14 at 07:09
47

Barring overflow, in floating-point, x + x + x is exactly the correctly rounded (i.e. nearest) floating-point number to the real 3*x, x + x + x + x is exactly 4*x, and x + x + x + x + x is again the correctly rounded floating-point approximation for 5*x.

The first result, for x + x + x, derives from the fact that x + x is exact. x + x + x is thus the result of only one rounding.

The second result is more difficult, one demonstration of it is discussed here (and Stephen Canon alludes to another proof by case analysis on the last 3 digits of x). To summarize, either 3*x is in the same binade as 2*x or it is in the same binade as 4*x, and in each case it is possible to deduce that the error on the third addition cancels the error on the second addition (the first addition being exact, as we already said).

The third result, “x + x + x + x + x is correctly rounded”, derives from the second in the same way that the first derives from the exactness of x + x.


The second result explains why 0.1 + 0.1 + 0.1 + 0.1 is exactly the floating-point number 0.4: the rational numbers 1/10 and 4/10 get approximated the same way, with the same relative error, when converted to floating-point. These floating-point numbers have a ratio of exactly 4 between them. The first and third result show that 0.1 + 0.1 + 0.1 and 0.1 + 0.1 + 0.1 + 0.1 + 0.1 can be expected to have less error than might be inferred by naive error analysis, but, in themselves, they only relate the results to respectively 3 * 0.1 and 5 * 0.1, which can be expected to be close but not necessarily identical to 0.3 and 0.5.

If you keep adding 0.1 after the fourth addition, you will finally observe rounding errors that make “0.1 added to itself n times” diverge from n * 0.1, and diverge even more from n/10. If you were to plot the values of “0.1 added to itself n times” as a function of n, you would observe lines of constant slope by binades (as soon as the result of the nth addition is destined to fall into a particular binade, the properties of the addition can be expected to be similar to previous additions that produced a result in the same binade). Within a same binade, the error will either grow or shrink. If you were to look at the sequence of the slopes from binade to binade, you would recognize the repeating digits of 0.1 in binary for a while. After that, absorption would start to take place and the curve would go flat.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 1
    On the first line you're saying that x+x+x is exactly correct, but from the example in the question it's not. – Alboz Sep 30 '14 at 12:15
  • 2
    @Alboz I say that `x + x + x` is exactly the correctly **rounded** floating-point number to the real 3*`x`. “correctly rounded” means “nearest” in this context. – Pascal Cuoq Sep 30 '14 at 12:22
  • 4
    +1 This should be the accepted answer. It actually offers explanations/proof of what's going on rather than just vague generalities. – R.. GitHub STOP HELPING ICE Sep 30 '14 at 20:31
  • @Alboz `x + x + x` is exactly three times `x` to within the limits of the representation. But `x` not being exactly 0.1, three times `x` is not exactly 0.3. – hobbs Oct 02 '14 at 07:35
  • 1
    @Alboz (all of which is envisioned by the question). But what this answer explains is how the errors fortuitously cancel rather than adding up in a worst-case fashion. – hobbs Oct 02 '14 at 07:37
  • +1 great answer, that conversation with Stephen Canon you link to is worth the upvote by itself. – Shafik Yaghmour Oct 05 '14 at 01:34
  • Why is the x+x is exact? – chebus May 01 '16 at 16:48
  • @chebus `x + x` is exact (barring overflow) in binary floating-point because the floating-point value to represent the result exactly exists, so there is not reason for `+` to return any other one. This value has the same significand as `x` and an exponent that's one more than the exponent of `x`. – Pascal Cuoq May 01 '16 at 18:55
  • Assuming x as not exactly representable say 0.1 , how can 0.1+0.1 would be exact 0.2 ? What happens to the error accumulation,do they happen to round off t(round to nearest ) to exact 0.2 ? – chebus May 01 '16 at 20:14
  • i am not able to wrap my head around this, can you pls give me some resources where i can study about this? – chebus May 04 '16 at 08:15
  • 1
    @chebus 0.1 is 0x1.999999999999999999999…p-4 in hexadecimal (an infinite sequence of digits). It is approximated in double-precision as 0x1.99999ap-4. 0.2 is 0x1.999999999999999999999…p-3 in hexadecimal. For the same reason that 0.1 is approximated as 0x1.99999ap-4, 0.2 is approximated as 0x1.99999ap-3. Meanwhile, 0x1.99999ap-3 is also exactly 0x1.99999ap-4+0x1.99999ap-4. – Pascal Cuoq May 04 '16 at 15:51
  • 1
    @chebus Your initial question was about `x+x`. The fact that `x+x` is exact has nothing to do with whether `x` is the closest approximation for some decimal representation, but yes, another property is also that if `x` is the closest floating-point approximation for some number y, then `x+x` is the closest approximation for the number 2y. – Pascal Cuoq May 04 '16 at 15:56
  • 1
    Thank you. I misunderstood it as x+x exact means approx 0.1+ approx 0.1 is exact 0.2 after rounding. so the x+x means , the approximation of x + approximation of x = approximation of 2*x Now given this scenario, that x+x(with x=0.1) value approx is 0x1.99999ap-3 , so is this rounded now to exact 0.2 because 0.1+0.1 in say java gives 0.2 (exact not the approx unlike 0.1+0.1+0.1) – chebus May 04 '16 at 18:21
0

Floating point systems do various magic including having a few extra bits of precision for rounding. Thus the very small error due to the inexact representation of 0.1 ends up getting rounded off to 0.5.

Think of floating point as being a great but INEXACT way to represent numbers. Not all possible numbers are easily represented in a computer. Irrational numbers like PI. Or like SQRT(2). (Symbolic math systems can represent them, but I did say "easily".)

The floating point value may be extremely close, but not exact. It may be so close that you could navigate to Pluto and be off by millimeters. But still not exact in a mathematical sense.

Don't use floating point when you need to be exact rather than approximate. For example, accounting applications want to keep exact track of a certain number of pennies in an account. Integers are good for that because they are exact. The primary issue you need to watch for with integers is overflow.

Using BigDecimal for currency works well because the underlying representation is an integer, albeit a big one.

Recognizing that floating point numbers are inexact, they still have a great many uses. Coordinate systems for navigation or coordinates in graphics systems. Astronomical values. Scientific values. (You probably cannot know the exact mass of a baseball to within a mass of an electron anyway, so inexactness doesn't really matter.)

For counting applications (including accounting) use integer. For counting the number of people that pass through a gate, use int or long.

DannyB
  • 41
  • 3
  • 3
    The question is tagged [java]. The Java language definition has **no** provision for “few extra bits of precision”, only for few extra exponent bits (and that is only if you do not use `strictfp`). Just because you have renounced to understand something does not mean it is unfathomable nor that others should renounce to understand it. See http://stackoverflow.com/questions/18496560 as an example of the lengths Java implementations will go to in order to implement the language definition (which does not include any provisions for extra precision bits nor, with `strictfp`, for any extra exp bit) – Pascal Cuoq Oct 05 '14 at 23:24