Why does sapply scale slower than for loop with sample size?

Question

So let's say I want to take the vector X = 2*1:N and raise e to the exponent of each element. (Yes, I recognize the best way to do that is simply by vectorization exp(X), but the point of this is to compare for loop with sapply). Well I tested by incrementally trying three methods (one with for loops, two with sapply applied in a different manner) with different sample sizes and measuring the corresponding time. I then plot the sample size N vs time t for each method.

Each method is indicated by "#####".

k <- 20 
t1 <- rep(0,k) 
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))


for (i in 1:k) {
  X <- 2*1:L[i]
  Y1 <- rep(0,L[i])
  t <- system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
  t1[i] <- t
}

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y2 <- sapply(1:L[i], function(q) exp(X[q])) )[3] #####
  t2[i] <- t
}

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y3 <- sapply(X, function(x) exp(x)) )[3] #####
  t3[i] <- t
}

plot(L, t3, type='l', col='green')
lines(L, t2,col='red')
lines(L, t1,col='blue')

plot(log(L), log(t1), type='l', col='blue')
lines(log(L), log(t2),col='red')
lines(log(L), log(t3), col='green')

We get the following results. Plot of N vs t: enter image description here

Plot of log(N) vs log(t) enter image description here

The blue plot is the for loop method, and the red and green plots are the sapply methods. In the regular plot, you can see that, as sample size gets larger, the for loop method is heavily favoured over the sapply methods, which is not what I would have expected at all. If you look at the log-log plot (in order to more easily distinguish the smaller N results) we see the expected result of sapply being more efficient than for loop for small N.

Does anybody know why sapply scales more slowly than for loop with sample size? Thanks.

@Bridgeburners `sapply(...) == simplify2array(lapply(...))`. What about `unlist(lapply(...))`? Just for completeness - I'm curious — gagolews, Oct 17 '14 at 18:07
@Bridgeburners: BTW, instead of `plot(log(L), log(t1), type='l', col='blue')` try `plot(L, t1, log="xy")` — gagolews, Oct 17 '14 at 18:10

score 4 · Answer 1 · edited May 23 '17 at 12:20

You're not accounting for the time it takes to allocate space for the resulting vector Y1. As the sample size increases, the time it takes to allocate Y1 becomes a larger share of the execution time, and the time it takes to do the replacement becomes a smaller share.

sapply always allocates memory for the the result, so that's one reason it would be less efficient as sample size increases. gagolews also has a very good point about sapply calling simplify2array. That (likely) adds another copy.

After some more testing, it looks like lapply is still about the same or slower than a byte-compiled function containing a for loop, as the objects get larger. I'm not sure how to explain this, other than possibly this line in do_lapply:

if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp);

Or possibly something with how lapply constructs the function call... but I'm mostly speculating.

Here's the code I used to test:

k <- 20 
t1 <- rep(0,k) 
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
L <- round(10^seq(4,6,length=k))

# put the loop in a function
fun <- function(X, L) {
  Y1 <- rep(0,L)
  for (j in 1:L)
    Y1[j] <- exp(X[j])
  Y1
}
# for loops often benefit from compiling
library(compiler)
cfun <- cmpfun(fun)

for (i in 1:k) {
  X <- 2*1:L[i]
  t1[i] <- system.time( Y1 <- fun(X, L[i]) )[3]
}
for (i in 1:k) {
  X <- 2*1:L[i]
  t2[i] <- system.time( Y2 <- cfun(X, L[i]) )[3]
}
for (i in 1:k) {
  X <- 2*1:L[i]
  t3[i] <- system.time( Y3 <- lapply(X, exp) )[3]
}
identical(Y1, Y2)          # TRUE
identical(Y1, unlist(Y3))  # TRUE
plot(L, t1, type='l', col='blue', log="xy", ylim=range(t1,t2,t3))
lines(L, t2, col='red')
lines(L, t3, col='green')

When I run the code "system.time(Y <- rep(0,1e7))" I get 80 ms. That's the memory allocation step for the for loop method (which I didn't include in the timing). Yet, as you can see, near the end of the plot (for size around 1e7) the for loop method beats the sapply methods by over 10 seconds. Is the sapply method of allocating memory much less efficient than creating a vector of 0s? — Bridgeburners, Oct 17 '14 at 18:16
@Bridgeburners: I didn't mean to imply that it would account for _all_ of the difference, but it is *a* difference. The extra (anonymous) function calls also add a bit of overhead. — Joshua Ulrich, Oct 17 '14 at 18:48

score 3 · Answer 2 · answered Oct 17 '14 at 21:16

Most of the points have been made before, but...

sapply() uses lapply() and then pays a one-time cost of formatting the result using simplify2array().
lapply() creates a long vector, and then a large number of short (length 1) vectors, whereas the for loop generates a single long vector.
The sapply() as written has an extra function call compared to the for loop.
Using gcinfo(TRUE) lets us see the garbage collector in action, and each approach results in the garbage collector running several times -- this can be quite expensive, and not completely deterministic.

Points 1 - 3 need to be interpreted in the artificial context of the example -- exp() is a fast function, exaggerating the relative contribution of memory allocation (2), function evaluation (3), and one-time costs (1). Point 4 emphasizes the need to replicate timings in a systematic way.

I started by loading the compiler and microbenchmark packages. I focused on the largest size only

library(compiler)
library(microbenchmark)
n <- 10^7

In my first experiment I replaced exp() with simple assignment, and tried different ways of representing the result in the for loop -- a vector of numeric values, or list of numeric vectors as implied by lapply().

fun0n <- function(n) {
    Y1 <- numeric(n)
    for (j in seq_len(n)) Y1[j] <- 1
}
fun0nc <- compiler::cmpfun(fun0n)

fun0l <- function(n) {
    Y1 <- vector("list", n)
    for (j in seq_len(n)) Y1[[j]] <- 1
}
fun0lc <- compiler::cmpfun(fun0l)

microbenchmark(fun0n(n), fun0nc(n), fun0lc(n), times=5)
## Unit: seconds
##       expr      min       lq     mean   median       uq      max neval
##   fun0n(n) 5.620521 6.350068 6.487850 6.366029 6.933915 7.168717     5
##  fun0nc(n) 1.852048 1.974962 2.028174 1.984000 2.035380 2.294481     5
##  fun0lc(n) 1.644120 2.706605 2.743017 2.998258 3.178751 3.187349     5

So it pays to compile the for loop, and there's a fairly substantial cost to generating a list of vectors. Again this memory cost is amplified by the simplicity of the body of the for loop.

My next experiment explored different *apply()

fun2s <- function(n)
    sapply(raw(n), function(i) 1)
fun2l <- function(n)
    lapply(raw(n), function(i) 1)
fun2v <- function(n)
    vapply(raw(n), function(i) 1, numeric(1))

microbenchmark(fun2s(n), fun2l(n), fun2v(n), times=5)
## Unit: seconds
##      expr      min       lq     mean   median       uq      max neval
##  fun2s(n) 4.847188 4.946076 5.625657 5.863453 6.130287 6.341282     5
##  fun2l(n) 1.718875 1.912467 2.024325 2.141173 2.142004 2.207105     5
##  fun2v(n) 1.722470 1.829779 1.847945 1.836187 1.845979 2.005312     5

There is a large cost to the simplification step in sapply(); vapply() is more robust than lapply() (I am guaranteed the type of the return) without performance penalty, so it should be my go-to function in this family.

Finally, I compared the for iteration to vapply() where the result is a list-of-vectors.

fun1 <- function(n) {
    Y1 <- vector("list", n)
    for (j in seq_len(n)) Y1[[j]] <- exp(0)
}
fun1c <- compiler::cmpfun(fun1)

fun3 <- function(n)
    vapply(numeric(n), exp, numeric(1))
fun3fun <- function(n)
    vapply(numeric(n), function(i) exp(i), numeric(1))

microbenchmark(fun1c(n), fun3(n), fun3fun(n), times=5)
## Unit: seconds
##        expr      min       lq     mean   median       uq      max neval
##    fun1c(n) 2.265282 2.391373 2.610186 2.438147 2.450145 3.505986     5
##     fun3(n) 2.303728 2.324519 2.646558 2.380424 2.384169 3.839950     5
##  fun3fun(n) 4.782477 4.832025 5.165543 4.893481 4.973234 6.346498     5

microbenchmark(fun1c(10^3), fun1c(10^4), fun1c(10^5),
               fun3(10^3), fun3(10^4), fun3(10^5),
               times=50)
## Unit: microseconds
##         expr   min    lq  mean median    uq   max neval
##  fun1c(10^3)   199   215   230    228   241   279    50
##  fun1c(10^4)  1956  2016  2226   2296  2342  2693    50
##  fun1c(10^5) 19565 20262 21671  20938 23410 24116    50
##   fun3(10^3)   227   244   254    254   264   295    50
##   fun3(10^4)  2165  2256  2359   2348  2444  2695    50
##   fun3(10^5) 22069 22796 23503  23251 24393 25735    50

The compiled for loop and vapply() are neck-in-neck; the extra function call almost doubles the execution time of vapply() (again, this effect is exaggerated by the simplicity of the example). There does not seem to be much change in relative speed across a range of sizes

score 0 · Answer 3 · answered Oct 17 '14 at 18:47

0

Try taking out the excess function(x) code that runs every iteration. It must have a lot of overhead. I didn't separate the two, but the for loop should also include all associated work for an apples to apples comparison like this:

t <- system.time(Y1 <- rep(0,L[i])) + system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####

A much faster sapply:

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y4 <- sapply(X,exp )[3]) #####
  t4[i] <- t
}

It's still slower, but much closer than the first two sapply's.

answered Oct 17 '14 at 18:47

ARobertson

2,857
18
24

I agree with you on the first point, that's what I should have done. As for the second point... well, you're right, using the already-defined exp function would be faster. But so would vectorization. It's not that I want a faster solution so much as that I want to understand _why_ this solution is slower. The majority of the time I use apply functions is when I need to use it with self-made functions, rather than R-defined functions (which tend to allow for vectorization most of the time). So understanding the "why" for self-made functions would be a lot more useful for me. – Bridgeburners Oct 17 '14 at 18:54
would it be faster if I defined the function outside of sapply and then used it inside the apply function? As in "f <- function(x) exp(x)" and then "Y3 <- sapply(X, f)"? – Bridgeburners Oct 17 '14 at 18:55
I would still think the most difference is in an additional anonymous function. For self made functions, keep the anonymous function in sapply but then add one to the for loop. – ARobertson Oct 17 '14 at 18:59

Why does sapply scale slower than for loop with sample size?

3 Answers3

Linked