10

I have the problem that a sum of variables representing probabilities in the [0,1] interval sum to 0 but should be >0. The problem is surely related to floating point representation and precision in R, but I cannot pin down where it goes wrong.

options(digits = 22)
p1 = 0.8
p2 = 0.9999999999999998

p11 = p1 * p2
p10 = p1 - p11
p01 = p2 - p11
p00 = 1 - p11 - p10 - p01

p11, p10, p01 all are numeric. p00 is also numeric but

> p00
[1] 0

and p00 == 0 is TRUE on my machine. However it should not be zero as it can be shown that p00 is >0 mathematically.

This problem seems to be related to the fact that p01 is small. However p01>0 is TRUE still applies on my machine. Why does it go wrong when taking the final sum in p00?

Is there a numerical trick to solve this problem, i.e. get an exact representation of p00? Note again p are probabilities in [0,1]. I considered using log and exp transformations but without consistent success.

tomka
  • 2,516
  • 7
  • 31
  • 45
  • 1
    Is simply using a `* 10^20` transformation workable? Alternatively, I have implemented solutions before that stored each digit as an element in an integer vector, but it's a little laborious. – Calum You Mar 13 '18 at 18:27
  • @CalumYou I think it might be. Can you elaborate perhaps? So I take `p1 * 10 ^20` and `p2 * 10 ^20`. How is the reverse transformation executed on `p11, p10, p01, p00`? – tomka Mar 13 '18 at 18:32
  • R uses 64 bit floats internally, `options(digits = 22)` will only change the number of digits for printing numbers. – gdkrmr Mar 13 '18 at 18:34
  • @gdkrmr yes indeed. I merely used it to illustrate `p00` prints to `0`, but also `p00 ==0` is `TRUE` here. – tomka Mar 13 '18 at 18:36

2 Answers2

9

R itself can only deal with 64 bit floating point numbers, the Rmpfr package can deal with arbitrary precision floating point numbers:

library(Rmpfr)

> p1 = mpfr("0.8", 128)
> p2 = mpfr("0.9999999999999998", 128)

> p11 = p1 * p2
> p10 = p1 - p11
> p01 = p2 - p11
> p00 = 1 - p11 - p10 - p01

> p00
1 'mpfr' number of precision  128   bits 
[1] 4.00000000000000000000000461461738779728e-17

Edit: Use stings to define mpfr

gdkrmr
  • 674
  • 4
  • 16
  • Good option, thanks. Back-transforming using `asNumeric(p00)` yields a double-precision (numeric) variable which is not `==0`. I am surprised, shouldn't we have `0` here again? Is this expectable for all back transformation of this kind? – tomka Mar 13 '18 at 19:04
  • the precision losses can stack up. You can also try using less precision, then `p10` in your example will also be exactly `0`. – gdkrmr Mar 13 '18 at 19:38
  • to get closer to 4e-17, you should define `p1` and `p2` from strings, i.e. `p1 = mpfr("0.9999999999999998")` – Simon Byrne Mar 13 '18 at 20:33
3

You want to avoid catastrophic cancellation, which occurs when subtracting values of approximately the same size: essentially this amplifies any existing errors.

One trick is to do the subtraction first, i.e.

> p1 = 0.8
> p2 = 0.9999999999999998
> (1-p1)*(1-p2)
[1] 4.440892e-17

The remaining inaccuracy is due to the fact that 0.9999999999999998 isn't actually 0.9999999999999998, but instead is stored as 0.9999999999999997779553950749686919152736663818359375 (which is the closest representable floating point number)

Update: to explain a bit further what is going on.

Since most floating point operations cannot be done exactly, they incur some error: this is the difference between the result you get and the true mathematical result. It is inherently relative, rather than absolute (i.e. significant digits, not digits after the decimal place).

When thinking about error, you need to keep track of 2 things:

  1. the error from the operation
  2. the already accumulated error.

The first is not too difficult: most operations incur a small amount of relative error that is at most about 1e-16 (half a unit in the last place. There are also a couple of operations that incur no error (subtracting 2 numbers that are within 2x of each other, multilying or dividing by powers of 2).

The second is trickier, but basically want to avoid anything that would amplify existing error: the most insidious is subtracting two numbers that are roughly equal: the absolute error stays the same, but the relative error can increase dramatically. Multilication and division are inherently relative operations, so don't have any effect here.

In your original code, you are doing a lot of such subtractions with quantities that have already accumulated some error. In my modification, I do the subtractions first, so this error is minimised (though in the case of p2, you can see that the result is still drastically influenced by the error in conversion from the decimal string "0.9999999999999998" to the binary floating point number).

Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
  • Fascinating. I do not understand yet why this is working though. So `1-p1-p2+p1*p2` does not work while the rewritten form `(1-p1)*(1-p2)` does. Where's the trick exactly (what do you mean by " do the substraction first")? I need to solve this problem also for a slightly more general setting and need to understand what to take into account. I would prefer this solution over increasing precision using `Rmpfr` (because of memory issues). – tomka Mar 13 '18 at 20:59
  • 1
    There is no catastrophic cancellation in `1-p1-p2+p1*p2`. All of the error occurs in `p1*p2` and is simply due to the fact that the product is in [½, 1), so its ULP is 2^-53. `1-p1` has no error, due to the subtraction property you note. Because `p1` and `p2` have the same binade, they have the same ULP, and, since all the trailing bits of 1 are zero, the least set bit in `1-p1` is at or above that ULP, so all bits of `1-p1-p2` fit in the significand, and there is no error. `1-p1-p2` is negative and of the same magnitude as `p1*p2`, so adding them is like subtracting and has no error. – Eric Postpischil Mar 13 '18 at 22:20
  • 1
    The reason `(1-p1)*(1-p2)` is better is that all of its error is in the multiplication (`1-p1` and `1-p2` are exact) and the product is in [2^-55, 2^-54), so its ULP is 2^-107, so it is able to represent the result much more precisely. – Eric Postpischil Mar 13 '18 at 22:21
  • (There areerrors in converting *p1* and *p2* to `p1` and `p2`. If you want to include that, you could say there is catastrophic cancellation in the subtractions. However, they are identical in the two expressions, so it is not a contributing factor to their difference.) – Eric Postpischil Mar 13 '18 at 22:52
  • you can still use `Rmpfr` for debugging, i.e. to see if your floats accumulate error – gdkrmr Mar 13 '18 at 23:06