How to deal with floating point errors in R

Question

Consider the following R function

is.sqrt <- function(x, y){
  if(x^2 == y) TRUE
  else FALSE
}

which answers whether x is the square root of y. If y is a perfect square, the function behaves as expected - is.sqrt(2,4) returns TRUE and is.sqrt(3,4) returns FALSE. The problem occurs if y is not a perfect square. For example,

is.sqrt(sqrt(2), 2)

returns FALSE. The reason for this can be seen by calculating

sqrt(2)^2 - 2

which returns 4.440892e-16. My first thought on how to solve this would be to round x^2 before comparing it to y but by how much is appropriate? And is this even a recommended way? Is there a standard method in R to deal with floating point precision?

`all.equal()` uses a parameter `tolerance=sqrt(.Machine$double.eps)` So you can do `is.sqrt <- function(x, y){ all.equal(x^2, y) }; is.sqrt(sqrt(2), 2)` — jogo, Feb 20 '18 at 13:28
Possible duplicate of [Why are these numbers not equal?](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) — kangaroo_cliff, Feb 20 '18 at 13:44
The proper solution to a problem like this depends on the specific problem you want to solve. On the face of it, your are attempting to test for exact equality using arithmetic that is necessarily inexact. If you accept “approximately equal” in some sense, as several answers are suggested, then you create additional false positives (situations where the test returns true even though the answer would be false if calculated with exact mathematics). So answering the problem **correctly** requires knowing when false positives are acceptable, when they are not, and the same for false negatives. — Eric Postpischil, Feb 20 '18 at 14:26

Daniel · Accepted Answer · 2018-02-20T13:54:28.447

5

you can use all.equal in your function, which "tests if two objects are 'nearly' equal"

is.sqrt <- function(x, y){
    isTRUE(all.equal(x^2,y)
}


 is.sqrt(sqrt(2), 2)
 # TRUE

 is.sqrt(sqrt(2), 3)
 # FALSE

edited Feb 20 '18 at 13:54

answered Feb 20 '18 at 13:33

Daniel

2,207
1
11
15

`all.equal` is a good idea (+1) but you wouldn't need the extra if else and TRUE FALSE statements. – talat Feb 20 '18 at 13:40
1

It depends on what the desired output is because not using the extra TRUE FALSE expression would return the relative mean difference if all.equal(x,y) is not true and I understand that OP wants to return either `TRUE` or `FALSE` – Daniel Feb 20 '18 at 13:44
3

Good point. In such a case I would go with `isTRUE(all.equal(x^2, y))` – talat Feb 20 '18 at 13:47
you're right, that is indeed much simpler, I will edit my answer – Daniel Feb 20 '18 at 13:52

Leonardo Siqueira · Answer 2 · 2018-02-20T14:52:45.390

1

You can use the near function from dplyr, it has a built-in tolerance.

is.sqrt <- function(x, y) {
  near(x^2, y)
}

is.sqrt(sqrt(2), 2)

> TRUE

edited Feb 20 '18 at 14:52

answered Feb 20 '18 at 14:04

Leonardo Siqueira

364
2
8

1

You don't need the if / else and TRUE / FALSE statements. By using those, you even give up the advantage of `dplyr::near` being vectorized (which all.equal is not). – talat Feb 20 '18 at 14:44

MKR · Answer 3 · 2018-02-20T19:22:20.223

0

Another option could be to use all.equal.numeric itself.

Option-A)

is.sqrt <- function(x, y){
  isTRUE(all.equal.numeric(x^2, y))
}

#> is.sqrt(sqrt(2),2)
#[1] TRUE

Option-B)

Using tolerance limit double precision. Optionally one can use .Machine.double.eps but I had preferred to use a fixed value as 1e-8.

is.sqrt_abs_tol<- function(x, y){
  tol <- 1e-8   # OR .Machine$double.eps can be used
  abs(x^2 - y) <= tol
}

#> is.sqrt_abs_tol(sqrt(2), 2)
#[1] TRUE

As agreed with @docendodiscimus, I thought do some performance analysis about these options.

library(microbenchmark)
library(dplyr)

is.sqrt_Dan <- function(x, y){
  isTRUE(all.equal(x^2,y))
}

is.sqrt_MKR <- function(x, y){
  isTRUE(all.equal.numeric(x^2, y))
}

is.sqrt_Leon <- function(x, y) {
  near(x^2, y)
}

is.sqrt_abs_tol<- function(x, y){
  tol <- 1e-5
  abs(x^2 - y) <= tol
}

microbenchmark(
  is.sqrt_Leon(sqrt(2), 2),
  is.sqrt_Dan(sqrt(2), 2),
  is.sqrt_MKR(sqrt(2), 2),
  is.sqrt_abs_tol(sqrt(2), 2),
  times=1000L  
)

                        expr   min    lq      mean median    uq     max neval
    is.sqrt_Leon(sqrt(2), 2)  2369  3948  4736.816   4737  5132   60001  1000
     is.sqrt_Dan(sqrt(2), 2) 36711 38291 44590.051  39474 41844 2750542  1000
     is.sqrt_MKR(sqrt(2), 2) 32369 33949 38130.556  35133 37501  211975  1000
 is.sqrt_abs_tol(sqrt(2), 2)   395  1185  4571.833   1579  1580 3107387  1000

Few observations from the analysis above:

Surprisingly near function from dplyr is faster than all.equal variants.
all.equal.numeric is slightly faster than all.equal
The custom version using abs and tolerance is super-fast.

edited Feb 20 '18 at 19:22

answered Feb 20 '18 at 14:22

MKR

19,739
4
23
33

1

What's the advantage of this over using `all.equal`? Also note that in case of a difference between x and y, your code won't return FALSE but something like `"Mean relative difference: 0.3333333"` (for `is.sqrt(sqrt(3),2)`) – talat Feb 20 '18 at 14:46
@docendodiscimus There is no much difference as such. `all.equal` internally usages `all.equal.numeric`. In that one call could be avoided but its very minor. You are right in your observation that `isTRUE` check should be applied with `all.equal.numeric` as well. – MKR Feb 20 '18 at 15:07
@docendodiscimus Let me perform some bench-marking on these methods later today and update my answer. If no improvement on time tehn it will be better to delete my answer. I'm no sure why people are hesitant to use `all.equal.numeric` directly. – MKR Feb 20 '18 at 15:18
1

I doubt that the potential performance difference would have an impact in a real world scenario. it’s like you also don’t normally call print.data.frame to print a table, right? – talat Feb 20 '18 at 15:25
@docendodiscimus Agree. May be it will be worth checking some other methods (comparing floats) in my analysis as well. Other thought is frequency of call to such functions. numeric comparisons are expected to be used more frequently. – MKR Feb 20 '18 at 15:32
@docendodiscimus I did some performance analysis. I'm surprised by some of the output though. Thanks for instigating me to do this. – MKR Feb 20 '18 at 19:29
1

If you look at the source code you will find that your results are not very surprising since a) all.equal does more than just checking equality and b) the dplyr near function is almost the same as your last function except it doesn’t need the extra assignment and uses the machine’s precision (which seems reasonable) – talat Feb 20 '18 at 19:35

How to deal with floating point errors in R

3 Answers3

Linked