Are "self-contained" functions more efficient in R?

Question

I'm writing a function that needs to call a function g passed as a parameter to each element of a list, iteratively.

I'm wondering how to make this the fastest possible. I can achieve an acceptable speed using Rcpp and specific kind of g (writing everything in Cpp), but I can't figure out if I can reach similar speed passing an R function as argument.

Was doing some tests to figure out why R is slower and found some really unexpected results:

minus <- function(x) -x
minus_vec <- Vectorize(minus, "x")

Testing with some simple functions to invert signs.

f0 <- function(x) {
  sapply(x, minus)
}

f1 <- function(x) {
  for(i in seq_along(x)){
    x[i] <- -x[i]
  }
  x
}

f2 <- function(x) {
  for(i in seq_along(x)){
    x[i] <- minus(x[i])
  }
  x
}

I got the following results:

a <- 1:10^5
library(rbenchmark)
benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))[,c(1,4)]

          test relative
1        f0(a)  454.842
2        f1(a)   25.579
3        f2(a)  178.211
4 minus_vec(a)  523.789
5     minus(a)    1.000

I would like some explanation on the following points:

Why don't f1 and f2 have the same speed? Writing the piece of code -x[i] and calling the function minus(x[i]) really should be so different when they do the exact same thing?
Why is f0 slower than f2? I always thought apply functions were more efficient than for loops, but never really understood why and now I even found a counter-example.
Can I make a function as fast as f1 using the function minus ?
Why does vectorizing minus (unnecessary since - is already vectorized, but that might not be the case always) made it so bad?

`apply` are *not* always more efficient than self-written loops, it really depends on the taks and on how well the loop is written. Also, you could consider `vapply` which is the most efficient of the `apply` family. — niko, Jan 14 '19 at 13:05
for f1 and f2 whilst `-x` is a "built-in", `minus` is another function so there is another function call nested within every iteration. — Stephen Henderson, Jan 14 '19 at 13:54
No you probably can't make your edit function as fast as a "built-in" but you might make it a bit faster by using the JIT byte code `compiler` package - http://homepage.divms.uiowa.edu/~luke/R/compiler/compiler.pdf — Stephen Henderson, Jan 14 '19 at 13:59
You are going to be led astray if you try to extrapolate techniques for speeding up something very basic like minus to anything of greater complexity. What does your function actually do? — Hugh, Jan 15 '19 at 07:42

niko · Answer 1 · 2019-01-14T14:22:48.933

Not a full answer, but here are a few notes

1 minus(x) vs -x: Doing nothing is better than doing something

Your function minus calls `-`, so the added step adds computation time. I honestly do not know the who's, what's and when's specifically, in other words I wouldn't know how much more computation time ought to be expected.

Here is an example highlighting it: we have four functions, all squaring numbers

fa <- function (n) n^2
fb <- function (n) fa(n)
fc <- function (n) fb(n)
fd <- function (n) fc(n)
Fa <- function (n) {
  for (i in seq_along(n)) n[i] <- fa(i)
  n
}
Fb <- function (n) {
  for (i in seq_along(n)) n[i] <- fb(i)
  n
}
Fc <- function (n) {
  for (i in seq_along(n)) n[i] <- fc(i)
  n
}
Fd <- function (n) {
  for (i in seq_along(n)) n[i] <- fd(i)
  n
}

And here are the benchmarking results

n <- 1:10^4
b <- benchmark(Fa(n),Fb(n),Fc(n),Fd(n), replications = 1000L)
b
#    test replications elapsed relative user.self sys.self user.child sys.child
# 1 Fa(n)         1000    3.93    1.000      3.85     0.00         NA        NA
# 2 Fb(n)         1000    7.08    1.802      6.94     0.02         NA        NA
# 3 Fc(n)         1000   10.16    2.585      9.94     0.06         NA        NA
# 4 Fd(n)         1000   13.68    3.481     13.56     0.00         NA        NA
# looks rather even
diff(b$elapsed)
# [1] 3.15 3.08 3.52

Now back to your minusfunction

a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a))          
b$elapsed[b$test == 'f2(a)'] - b$elapsed[b$test == 'f1(a)']    
# [1] 3.39

2 apply vs for vs Vectorize:

@NavyCheng provided for some good material on the topic. Now my understanding is, the apply family (just like Vectorize) loops in R (whereas if I'm not mistaking the looping for `-` is done in C).

Again, I do not know about the exact details, but if apply/Vectorize use R loops, then, in theory (and often in practice), it is possible to write a proper for loop that will perform as good or better.

3 A Function as fast as f1:

Ad-hoc, the closes I came up was by cheating using the Rcpp package. (cheating since one writes the function in c++ first)

In C++

#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector minusCpp(NumericVector x) {
  for (int k = 0; k < x.length(); ++k) {
    x[k] = -x[k];
  }
  return x;
}

Now to the bechmarks in R

a <- 1:10^5
b <- benchmark(f0(a), f1(a), f2(a), minus_vec(a), minus(a), minusCpp(a))          
b
#           test replications elapsed relative user.self sys.self user.child sys.child
# 1        f0(a)          100    9.47       NA      9.22     0.01         NA        NA
# 2        f1(a)          100    0.53       NA      0.54     0.00         NA        NA
# 3        f2(a)          100    4.23       NA      4.24     0.00         NA        NA
# 5     minus(a)          100    0.00       NA      0.00     0.00         NA        NA
# 4 minus_vec(a)          100   10.42       NA     10.39     0.02         NA        NA
# 6  minusCpp(a)          100    0.05       NA      0.04     0.00         NA        NA

I think there is no way then. Sounds like looping in R and calling Rcpp function many times is slow, and looping in Rcpp and calling an R function from Rcpp many times also doesn't work. It's either all Rcpp or slow then. — VFreguglia, Jan 14 '19 at 15:36
@Freguglia Calling `R` functions in `Rcpp` is possible, see here http://gallery.rcpp.org/articles/r-function-from-c++/ . — niko, Jan 14 '19 at 15:38
Yes, but as I mentioned and the warning in that link states, calling `R` function from `Rcpp` isn't very efficient. "Calling a function is simple and tempting. It is also slow as there are overheads involved. And calling it repeatedly from inside your C++ code, possibly buried within several loops, is outright silly." — VFreguglia, Jan 14 '19 at 15:46
@Freguglia Yes, that's true. I interpreted your "doesn't work" literally. — niko, Jan 14 '19 at 15:48
@nate *forloop* in R is faster than *apply family* in **sample** situation, and I find that *forloop* with bad performance is because [data is not modified in place](http://adv-r.had.co.nz/memory.html#modification). I hope this will help to complete this post. — Navy Cheng, Jan 16 '19 at 06:01

Navy Cheng · Answer 2 · 2019-01-14T14:05:11.477

Ignore -x[i] and minus(-x[i]), and I summarize the four questions to two:

Why apply family is slower than forloop?
Why Vectorize is slower than apply family?

For the 1st question:

The apply functions are designed to be convenient and clear to read, not necessarily fast.

and apply family will do more things than forloop,

Also the sapply function first uses as.vector(unlist(...)) to convert anything to a vector, and in the end tries to simplify the answer into a suitable form.

You can't read here and here for more detail.

For for 2rd question, it's because Vectorize is a wrapper of mapply and if you type Vectorize in Rstudio, you'll see the detail code. you can read this for more help.

I think it would be a better answer if you would summarise the argument found in those links. And to my best knowledge `Vectorize` is a wrapper around `mapply`. — markus, Jan 14 '19 at 13:51

Are "self-contained" functions more efficient in R?

2 Answers2