1

Lets say I have a function like follows:

testFunction <- function(testInputs){
    print( sum(testInputs)+1 == 2 )
    return( sum(testInputs) == 1 )
}

When I test this on command line with following input: c(0.65, 0.3, 0.05), it prints and returns TRUE as expected.

However when I use c(1-0.3-0.05, 0.3, 0.05) I get TRUE printed and FALSE returned. Which makes no sense because it means sum(testInputs)+1 is 2 but sum(testInputs) is not 1.

Here is what I think: Somehow printed value is not exactly 1 but probably 0.9999999..., and its rounded up on display. But this is only a guess. How does this work exactly?

ozgeneral
  • 6,079
  • 2
  • 30
  • 45
  • My guessis that happens because numbers like `0.3` do not have an exact representation in binary form. See [here](http://www.exploringbinary.com/why-0-point-1-does-not-exist-in-floating-point/) – Gregor de Cillia Mar 18 '18 at 22:36
  • I checked for testInputs[1]<0.65, testInputs[2]<0.3, testInputs[3]<0.05 and result was TRUE, FALSE, FALSE. So somehow 0.3 has no problem but 0.65 has. And this only happens when I call the function within Rshiny, but it doesnt cause any problems on command line. – ozgeneral Mar 18 '18 at 22:38
  • Yes, you will need to have some arithmetic operations involved. Using for example `x <- 0.65; x == 0.65` will always be `TRUE`. – Gregor de Cillia Mar 18 '18 at 22:41
  • Also, you don't need shiny to reproduce this. `ti <- c(1-0.3-0.05, 0.3, 0.05); sum(ti) == 1` returns `FALSE` in my console. – Gregor de Cillia Mar 18 '18 at 22:45
  • You are right, just cut out the irrelevant parts. Sorry apparently I tested with 0.65 on console and 1-0.3-0.05 on shiny, which led me think shiny was the difference. – ozgeneral Mar 18 '18 at 22:55
  • Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – Gregor de Cillia Mar 18 '18 at 23:16

1 Answers1

1

This is exactly a floating point problem, but the interesting thing about it for me is how it demonstrates that the return value of sum() produces this error, but with + you don't get it.

See the links about floating point math in the comments. Here is how to deal with it:

sum(1-0.3-0.5, 0.3, 0.05) == 1
# [1] FALSE
dplyr::near(sum(1-0.3-0.05, 0.3, 0.05), 1)
# [1] TRUE

For me, the fascinating thing is:

(1 - 0.3 - 0.05 + 0.3 + 0.05) == 1
# [1] TRUE

Because you can't predict how the various implementations of floating point arithmetic will behave, you need to correct for it. Here, instead of using ==, use dplyr::near(). This problem (floating point math is inexact, and also unpredictable), is found across languages. Different implementations within a language will result in different floating point errors.

As I discussed in this answer to another floating point question, dplyr::near(), like all.equal(), has a tolerance argument, here tol. It is set to .Machine$double.eps^0.5, by default. .Machine$double.eps is the smallest number that your machine can add to 1 and be able to distinguish it from 1. It's not exact, but it's on that order of magnitude. Taking the square root makes it a little bigger than that, and allows you to identify exactly those values that are off by an amount that would make a failed test for equality likely to be a floating point error.

NOTE: yes, near() is in dplyr, which i almost always have loaded, so I forgot it wasn't in base... you could use all.equal(), but look at the source code of near(). It's exactly what you need, and nothing you don't:

near
# function (x, y, tol = .Machine$double.eps^0.5) 
# {
#     abs(x - y) < tol
# }
# <environment: namespace:dplyr>
De Novo
  • 7,120
  • 1
  • 23
  • 39
  • sorry if I missed it, can you please say where the `near()` function is from? It's not in base R ... – Ben Bolker Mar 18 '18 at 23:38
  • 1
    oh bother. i forgot it's not base r. It's `dplyr`. You could theoretically use `all.equal()` as well, but it's code is just not as clean. – De Novo Mar 18 '18 at 23:41