How can I speed up how I test for equality between numeric in R

Question

currently here is how I test for numeric equality, it works if x is a numeric and y a vector.

almostEqual <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  ifelse( mag > tolerance, diff/mag <= tolerance, diff <= tolerance)
}

Example:

almostEqual(1,c(1,1.00000000000001,1.00002))
[1]  TRUE  TRUE FALSE

Can you make it faster (just with base R) ?

EDIT: I advise this which I find usefull

"%~=%" <- almostEqual;
"%~in%" <- function(x,y){ sapply(x,FUN=function(a,b){any(almostEqual(a,b))},y)};

Have you found `all.equal`? And if so, what is it missing that your function has? — Justin, Sep 17 '13 at 14:01
@Justin: `all.equal` requires objects of the same length and doesn't tell you _which_ elements are `TRUE` and/or `FALSE`. — Joshua Ulrich, Sep 17 '13 at 14:02
@JoshuaUlrich Almost does! and you can always coerce the result to logical and take `NA` as `FALSE`... `veql = function(x) sapply(x, function(x) as.logical(all.equal(1, x)))` — Justin, Sep 17 '13 at 14:04
@Justin No way that's going to be faster than the OP's current solution... — Joshua Ulrich, Sep 17 '13 at 14:05
I didn't say it was! Just curious what was wrong with `all.equal` :) — Justin, Sep 17 '13 at 14:06
You could try Rcpp. [This](http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2010-June/000807.html) should be relevant. — Roland, Sep 17 '13 at 14:11
@Roland I said just base R, and I just got -1 on another post for the exact same answer :) — statquant, Sep 17 '13 at 14:22
Most of the function you use are very much optimized. You can only improve performance if you sacrify `NA` handling or some input checks. You need to clarify, if that is acceptable. — Roland, Sep 17 '13 at 14:25
@Roland I agree with Rcpp, which also handles NA automagically. The package really is amazing. — Simon O'Hanlon, Sep 17 '13 at 14:42

Simon O'Hanlon · Accepted Answer · 2013-09-17T23:08:57.920

3

Cutting out ifelse for a start would save you 57%...

almostEqual2 <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  out <- logical(length(y))
  out[ mag > tolerance ] <- (diff/mag <= tolerance)[ mag > tolerance]
  out[ ! mag > tolerance ] <- (diff <= tolerance)[! mag > tolerance]
  return( out )
}


require(microbenchmark)

set.seed(1)
x <- 1
y <- rnorm(1e6)

bm <- microbenchmark( almostEqual(x,y,tol=0.5) , almostEqual2(x,y,tol=0.5) , times = 25 )
print( bm , digits = 3 , unit = "relative" , order = "median" )
#Unit: relative
#                          expr  min   lq median   uq  max neval
# almostEqual2(x, y, tol = 0.5) 1.00 1.00   1.00 1.00 1.00    25
#  almostEqual(x, y, tol = 0.5) 2.09 1.76   1.73 1.86 1.82    25

Using Rcpp

I don't see why you wouldn't use the most depended on package in CRAN outside of base, but if you wanted to you could realise a 5 times speed-up over my previous effort (10 times on the OP) and it also handles NA gracefully...

#include <Rcpp.h>

using namespace Rcpp;

//[[Rcpp::export]]


LogicalVector all_equalC( double x , NumericVector y , double tolerance ){
  NumericVector diff = abs( x - y );
  NumericVector mag = pmax( abs(x) , abs(y) );
  LogicalVector res = ifelse( mag > tolerance , diff/mag <= tolerance , diff <= tolerance );
  return( res );
}

Made available using Rcpp::sourceCpp('path/to/file.cpp'). Results...

bm <- microbenchmark( almostEqual(x,y,tol=0.5) , almostEqual2(x,y,tol=0.5) , all_equalC(x,y,tolerance=0.5) , times = 25 )
print( bm , digits = 3 , unit = "relative" , order = "median" )
#Unit: relative
#                              expr  min   lq median   uq   max neval
# all_equalC(x, y, tolerance = 0.5) 1.00 1.00   1.00 1.00  1.00    25
#     almostEqual2(x, y, tol = 0.5) 4.50 4.39   5.39 5.24  7.32    25
#      almostEqual(x, y, tol = 0.5) 8.69 9.34   9.24 9.96 10.91    25

edited Sep 17 '13 at 23:08

answered Sep 17 '13 at 14:16

Simon O'Hanlon

58,647
14
142
184

but you loose `NA` treatment, don`t you? – Roland Sep 17 '13 at 14:18
@Roland yes you do because you can't subscript with NA. I didn't take into account that this might be something you have to deal with. Perhaps the OP can weigh in? – Simon O'Hanlon Sep 17 '13 at 14:21
@Roland, nice but I think I'll keep the NA in – statquant Sep 17 '13 at 14:24
@statquant why not Rcpp? Out of interest? – Simon O'Hanlon Sep 17 '13 at 14:25
1

@SimonO101: just to improve compatibility... additionally the functions shows by Rolland requires compiler(Rtools) + Rcpp this is too much – statquant Sep 17 '13 at 14:27
@statquant Rtools is a really simple install. – Simon O'Hanlon Sep 17 '13 at 14:45
@SimonO101 If you have admin rights, which I don't in my office. :( – Roland Sep 17 '13 at 14:57
@Rolland, NO you don't need admin rights for Rtools ;), the only thing you'll miss is the environment variables and you can set it yourself (I use it at work) – statquant Sep 17 '13 at 15:18
@statquant so maybe Rcpp is the way to go for you if you already have it? The function is essentially a verbatim copy of what you already have thanks to Rcpp sugar. – Simon O'Hanlon Sep 17 '13 at 15:21
The code for `almostEqual2` has a mistake: the right handside of the assignments should also be indexing by `mag > tolerance` and `!mag > tolerance`. Also your example is not very representative because `mag > tol` is TRUE everywhere. – flodel Sep 17 '13 at 21:14
@flodel thanks for the error spot. As for the second point, I don't really understand why that matters? `Rcpp::ifelse` only evaluates *either* the TRUE or the FALSE condition (unlike `base::ifelse` which [**evaluates both**](http://stackoverflow.com/a/16275201/1478381) – Simon O'Hanlon Sep 17 '13 at 22:54
at least from a unit test point of view. Using e.g. `tol = 1.5` would have yielded a warning, and results different from the OP's. – flodel Sep 17 '13 at 23:03
@flodel My `Rcpp` function gives the desired results right? I corrected the error in the almostEqual2 function as well so all should function as expected, even with `tol=1.5`. I get the same results as usig the OP code. Just faster! – Simon O'Hanlon Sep 17 '13 at 23:06

How can I speed up how I test for equality between numeric in R

1 Answers1

Using Rcpp