2

Why this assertion fails in Java:

    double eps = 0.00000000000001;
    double ten = 10.0;
    double result = (ten - (ten - eps));
    Assert.assertTrue(result <= eps);

If I remove one zero before digit 1 in eps, the assertion passes. I assume that this is related to the floating point implementation, but I'm not sure exactly how.

Also, if I replace digit 1 with 2 (like 0.00000000000002) the assertion passes as well. In that case, I can even add more zeros before the digit 2, the test will still pass. I tried with Double.MIN_VALUE (4.9E-324) and the assertion also passed.

Can someone, please, explain in more details:

  1. Why the assertion passes with eps = 1.0E-13 but not with eps = 1.0E-14
  2. Why the assertion passes with eps = Double.MIN_VALUE (4.9E-324) and not with eps = 1.0E-14

EDIT: The assertion also fails when I increase the eps to 1.0E-8: double eps = 0.00000001;

Milan
  • 265
  • 1
  • 5
  • 13
  • The answer to [this](http://stackoverflow.com/questions/3728246/what-should-be-the-epsilon-value-when-performing-double-value-equal-comparison) should help. ("The magnitude of finite-machine precision error can be arbitrarily large. ... if you increase N you can get just about any level of error you desire") – Salem Jan 17 '17 at 11:38
  • That's most likely a problem with storing 1.0E-14 precisely enough. In any case for numerical stability you'll probably not want to use such a low epsilon anyways. – Thomas Jan 17 '17 at 11:40

4 Answers4

1

This is because of the organization of the bytes that represents the double type.

As you can see on the image below, it is a 64 bit structure. The bits [b0 .. b51] are 'concatenated' and elevated by the exponent, [b52 .. b62].

Representation of the double type

And the equation that determines what each combination of bits represents in real value, is:

Double formula

With this formula, you have that the minimum value is represented by

3ff0 0000 0000 000116   =>  1.0000000000000002

For better explanation, see this wiki page Double-precision floating-point format

D.Kastier
  • 2,640
  • 3
  • 25
  • 40
  • Thanks for the answer. I'd read the article before posting the question, though, but it does not answer my questions. I still don't know why assertion fails for `eps = 0.00000001` but it passes for `eps = 0.0000000000001`. – Milan Jan 17 '17 at 13:28
  • I see. Sorry for that. The why this happens is based on the way that the bits are computed at the math processor, but I do not know how to explain in "good" words – D.Kastier Jan 17 '17 at 13:44
  • 1
    Give a read at this article, it explains how to sum floats. It will give you the basis to how your situation happens. http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BinMath/addFloat.html – D.Kastier Jan 17 '17 at 13:55
1

In the last assertion you're comparing result (1.0658141036401503E-14) and eps (1.0E-14), matimatically that shoud be wrong as espected from the assertion, result in this case is bigger than eps. If you remove one 0 from eps rps become 1.0E-13 that is bigger than 1.0658141036401503E-14 in this case

talalUcef
  • 114
  • 3
  • The result also changes when I increase eps to 1.0E-13, it's not 1.0658141036401503E-14 anymore. Besides, the point is why the behavior I described happens. Note that assertion passes for `eps = 1.0E-11` but not for `eps = 1.0E-10` and `eps = 1.0E-12`, that's weird isn't it? – Milan Jan 17 '17 at 14:26
  • Because the result is less than the eps in the case of 1.0E-13,1.0E-11. Note that the statement is wrong if the power number is odd (-13, -11, -9). `// eps = 1.0E-14 : 1.0658141036401503E-14 <= 1.0E-14 ==> false // eps = 1.0E-13 : 9.947598300641403E-14 <= 1.0E-13 ==> true // eps = 1.0E-12 : 1.000088900582341E-12 <= 1.0E-12 ==> false // eps = 1.0E-11 : 9.99911264898401E-12 <= 1.0E-11 ==> true // eps = 1.0E-10 : 1.000000082740371E-10 <= 1.0E-10 ==>false` – talalUcef Jan 17 '17 at 14:49
  • Assertion fails for `eps = 1.0E-10` as well. – Milan Jan 17 '17 at 15:00
1

The problem is that the assertion code is wrong-ish in a sense that it does not take into account the second subtraction ten - (ten - eps).

Let's explain this step by step. Let eps = 0.00000001 (1.0E-8). In this case, 10.0 - eps is 9.99999999. So far, so good. However, 10.0 - 9.99999999 is 0.00000001000000082740371, which is around the expected result of 0.00000001, but just a little bit larger, because floating point arithmetic (usually) gives just good enough approximation. Therefore, for some eps values the final result is very close, but just below the actual result and for some values it is again very close, but just above the actual result.

The code needs to be fixed in order to take into account that the result of the second subtraction is also just an approximation.

One way to do it is to change the assertion to:

Assert.assertTrue(Math.abs(result - eps) <= eps);

In order to understand more on floating point arithmetics, I've found this article quite well written: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

This quote summarize why the errors in floating point arithmetics happen:

There are two reasons why a real number might not be exactly representable as a floating-point number. The most common situation is illustrated by the decimal number 0.1. Although it has a finite decimal representation, in binary it has an infinite repeating representation. Thus when β = 2, the number 0.1 lies strictly between two floating-point numbers and is exactly representable by neither of them.

Milan
  • 265
  • 1
  • 5
  • 13
-1

Try following code:

BigDecimal eps1 = new BigDecimal(eps);
BigDecimal ten1 = new BigDecimal(ten);
BigDecimal result1 = ten1.subtract( ten1.subtract(eps1) );

It should be stable regardless eps

Alex Radzishevsky
  • 3,416
  • 2
  • 14
  • 25