4

currently here is how I test for numeric equality, it works if x is a numeric and y a vector.

almostEqual <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  ifelse( mag > tolerance, diff/mag <= tolerance, diff <= tolerance)
}

Example:

almostEqual(1,c(1,1.00000000000001,1.00002))
[1]  TRUE  TRUE FALSE

Can you make it faster (just with base R) ?

EDIT: I advise this which I find usefull

"%~=%" <- almostEqual;
"%~in%" <- function(x,y){ sapply(x,FUN=function(a,b){any(almostEqual(a,b))},y)};
statquant
  • 13,672
  • 21
  • 91
  • 162
  • Have you found `all.equal`? And if so, what is it missing that your function has? – Justin Sep 17 '13 at 14:01
  • 1
    @Justin: `all.equal` requires objects of the same length and doesn't tell you _which_ elements are `TRUE` and/or `FALSE`. – Joshua Ulrich Sep 17 '13 at 14:02
  • @JoshuaUlrich Almost does! and you can always coerce the result to logical and take `NA` as `FALSE`... `veql = function(x) sapply(x, function(x) as.logical(all.equal(1, x)))` – Justin Sep 17 '13 at 14:04
  • @Justin No way that's going to be faster than the OP's current solution... – Joshua Ulrich Sep 17 '13 at 14:05
  • I didn't say it was! Just curious what was wrong with `all.equal` :) – Justin Sep 17 '13 at 14:06
  • You could try Rcpp. [This](http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2010-June/000807.html) should be relevant. – Roland Sep 17 '13 at 14:11
  • @Roland I said just base R, and I just got -1 on another post for the exact same answer :) – statquant Sep 17 '13 at 14:22
  • Most of the function you use are very much optimized. You can only improve performance if you sacrify `NA` handling or some input checks. You need to clarify, if that is acceptable. – Roland Sep 17 '13 at 14:25
  • @Roland I agree with Rcpp, which also handles NA automagically. The package really is amazing. – Simon O'Hanlon Sep 17 '13 at 14:42
  • My self-promotion: `cgwtools::approxeq` . – Carl Witthoft Sep 17 '13 at 17:16

1 Answers1

3

Cutting out ifelse for a start would save you 57%...

almostEqual2 <- function(x, y, tolerance=1e-8) {
  diff <- abs(x - y)
  mag <- pmax( abs(x), abs(y) )
  out <- logical(length(y))
  out[ mag > tolerance ] <- (diff/mag <= tolerance)[ mag > tolerance]
  out[ ! mag > tolerance ] <- (diff <= tolerance)[! mag > tolerance]
  return( out )
}


require(microbenchmark)

set.seed(1)
x <- 1
y <- rnorm(1e6)

bm <- microbenchmark( almostEqual(x,y,tol=0.5) , almostEqual2(x,y,tol=0.5) , times = 25 )
print( bm , digits = 3 , unit = "relative" , order = "median" )
#Unit: relative
#                          expr  min   lq median   uq  max neval
# almostEqual2(x, y, tol = 0.5) 1.00 1.00   1.00 1.00 1.00    25
#  almostEqual(x, y, tol = 0.5) 2.09 1.76   1.73 1.86 1.82    25

Using Rcpp

I don't see why you wouldn't use the most depended on package in CRAN outside of base, but if you wanted to you could realise a 5 times speed-up over my previous effort (10 times on the OP) and it also handles NA gracefully...

#include <Rcpp.h>

using namespace Rcpp;

//[[Rcpp::export]]


LogicalVector all_equalC( double x , NumericVector y , double tolerance ){
  NumericVector diff = abs( x - y );
  NumericVector mag = pmax( abs(x) , abs(y) );
  LogicalVector res = ifelse( mag > tolerance , diff/mag <= tolerance , diff <= tolerance );
  return( res );
}

Made available using Rcpp::sourceCpp('path/to/file.cpp'). Results...

bm <- microbenchmark( almostEqual(x,y,tol=0.5) , almostEqual2(x,y,tol=0.5) , all_equalC(x,y,tolerance=0.5) , times = 25 )
print( bm , digits = 3 , unit = "relative" , order = "median" )
#Unit: relative
#                              expr  min   lq median   uq   max neval
# all_equalC(x, y, tolerance = 0.5) 1.00 1.00   1.00 1.00  1.00    25
#     almostEqual2(x, y, tol = 0.5) 4.50 4.39   5.39 5.24  7.32    25
#      almostEqual(x, y, tol = 0.5) 8.69 9.34   9.24 9.96 10.91    25
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • but you loose `NA` treatment, don`t you? – Roland Sep 17 '13 at 14:18
  • @Roland yes you do because you can't subscript with NA. I didn't take into account that this might be something you have to deal with. Perhaps the OP can weigh in? – Simon O'Hanlon Sep 17 '13 at 14:21
  • @Roland, nice but I think I'll keep the NA in – statquant Sep 17 '13 at 14:24
  • @statquant why not Rcpp? Out of interest? – Simon O'Hanlon Sep 17 '13 at 14:25
  • 1
    @SimonO101: just to improve compatibility... additionally the functions shows by Rolland requires compiler(Rtools) + Rcpp this is too much – statquant Sep 17 '13 at 14:27
  • @statquant Rtools is a really simple install. – Simon O'Hanlon Sep 17 '13 at 14:45
  • @SimonO101 If you have admin rights, which I don't in my office. :( – Roland Sep 17 '13 at 14:57
  • @Rolland, NO you don't need admin rights for Rtools ;), the only thing you'll miss is the environment variables and you can set it yourself (I use it at work) – statquant Sep 17 '13 at 15:18
  • @statquant so maybe Rcpp is the way to go for you if you already have it? The function is essentially a verbatim copy of what you already have thanks to Rcpp sugar. – Simon O'Hanlon Sep 17 '13 at 15:21
  • The code for `almostEqual2` has a mistake: the right handside of the assignments should also be indexing by `mag > tolerance` and `!mag > tolerance`. Also your example is not very representative because `mag > tol` is TRUE everywhere. – flodel Sep 17 '13 at 21:14
  • @flodel thanks for the error spot. As for the second point, I don't really understand why that matters? `Rcpp::ifelse` only evaluates *either* the TRUE or the FALSE condition (unlike `base::ifelse` which [**evaluates both**](http://stackoverflow.com/a/16275201/1478381) – Simon O'Hanlon Sep 17 '13 at 22:54
  • at least from a unit test point of view. Using e.g. `tol = 1.5` would have yielded a warning, and results different from the OP's. – flodel Sep 17 '13 at 23:03
  • @flodel My `Rcpp` function gives the desired results right? I corrected the error in the almostEqual2 function as well so all should function as expected, even with `tol=1.5`. I get the same results as usig the OP code. Just faster! – Simon O'Hanlon Sep 17 '13 at 23:06