166

...regarding execution time and / or memory.

If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must come from apply (tapply, sapply, ...) itself.

Jeromy Anglim
  • 33,939
  • 30
  • 115
  • 173
steffen
  • 2,152
  • 4
  • 19
  • 30

5 Answers5

165

The apply functions in R don't provide improved performance over other looping functions (e.g. for). One exception to this is lapply which can be a little faster because it does more work in C code than in R (see this question for an example of this).

But in general, the rule is that you should use an apply function for clarity, not for performance.

I would add to this that apply functions have no side effects, which is an important distinction when it comes to functional programming with R. This can be overridden by using assign or <<-, but that can be very dangerous. Side effects also make a program harder to understand since a variable's state depends on the history.

Edit:

Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:

fibo <- function(n) {
  if ( n < 2 ) n
  else fibo(n-1) + fibo(n-2)
}
system.time(for(i in 0:26) fibo(i))
# user  system elapsed 
# 7.48    0.00    7.52 
system.time(sapply(0:26, fibo))
# user  system elapsed 
# 7.50    0.00    7.54 
system.time(lapply(0:26, fibo))
# user  system elapsed 
# 7.48    0.04    7.54 
library(plyr)
system.time(ldply(0:26, fibo))
# user  system elapsed 
# 7.52    0.00    7.58 

Edit 2:

Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide apply family functions (even the foreach package is essentially equivalent, despite the name). Here's a simple example of the sapply function in snow:

library(snow)
cl <- makeSOCKcluster(c("localhost","localhost"))
parSapply(cl, 1:20, get("+"), 3)

This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see Tierney's clustering page). snow has the following apply functions:

parLapply(cl, x, fun, ...)
parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
parApply(cl, X, MARGIN, FUN, ...)
parRapply(cl, x, fun, ...)
parCapply(cl, x, fun, ...)

It makes sense that apply functions should be used for parallel execution since they have no side effects. When you change a variable value within a for loop, it is globally set. On the other hand, all apply functions can safely be used in parallel because changes are local to the function call (unless you try to use assign or <<-, in which case you can introduce side effects). Needless to say, it's critical to be careful about local vs. global variables, especially when dealing with parallel execution.

Edit:

Here's a trivial example to demonstrate the difference between for and *apply so far as side effects are concerned:

df <- 1:10
# *apply example
lapply(2:3, function(i) df <- df * i)
df
# [1]  1  2  3  4  5  6  7  8  9 10
# for loop example
for(i in 2:3) df <- df * i
df
# [1]  6 12 18 24 30 36 42 48 54 60

Note how the df in the parent environment is altered by for but not *apply.

John
  • 23,360
  • 7
  • 57
  • 83
Shane
  • 98,550
  • 35
  • 224
  • 217
  • 33
    Most multi core packages for R also implement parallelization through the `apply` family of functions. Therefore structuring programs so they use apply allows them to be parallelized at a very small marginal cost. – Sharpie Feb 16 '10 at 21:38
  • Sharpie - thank you for that! Any idea for an example showing that (on windows XP) ? – Tal Galili Feb 17 '10 at 11:39
  • 5
    I would suggest looking at the `snowfall` package and trying the examples in their vignette. `snowfall` builds on top of the `snow` package and abstracts the details of parallelization even further making it dead simple to execute parallelized `apply` functions. – Sharpie Feb 19 '10 at 03:31
  • 1
    @Sharpie but note that `foreach` has since become available and seems to be much inquired about on SO. – Ari B. Friedman Aug 01 '11 at 09:16
  • @gsk3 In R 2.14.0 there will be a new core package called `parallel` that is basically a re-factored version of `snow`---so Snow-like semantics will be available by default. – Sharpie Sep 11 '11 at 05:15
  • as you are aware of side-efffectfulness, I still wonder about [that one](http://stackoverflow.com/questions/17530725/promises-in-lapply-r) – nicolas Jul 10 '13 at 21:31
  • The first sentence of this answer is not in line with my experience with for-loops in R. As a _general_ rule, [x]apply functions are at least comparable and frequently significantly faster that for-loops. See e.g.http://stackoverflow.com/questions/13676878/fastest-way-to-get-min-from-every-column-in-a-matrix. – c.gutierrez Jul 13 '14 at 17:33
  • When I run `lapply(2:3, function(i) df <- df * i)` I get a different output than the one in the post (i.e., a list of vectors). Could you double check? – Dambo May 18 '16 at 18:02
  • 1
    @Shane, at the very top of your answer, you link to another question as an an example of a case where `lapply` is "a little faster" than a `for` loop. However, there, I am not seeing anything suggesting so. You only mention that `lapply` is faster than `sapply`, which is a well known fact for other reasons (`sapply` tries to simplify the output and hence has to do a lot of data size checking and potential conversions). Nothing related to `for`. Am I missing something? – flodel Dec 01 '16 at 04:15
  • I would not call a recursive fibonacci function trivial in this case since the number of recursive calls is on the order of 2^n. So for 26, you get 26 top-level calls into the fib function, but those then generate a horrific number of calls back into fib() without involving one of the apply functions. – Chris Jan 11 '21 at 14:41
74

Sometimes speedup can be substantial, like when you have to nest for-loops to get the average based on a grouping of more than one factor. Here you have two approaches that give you the exact same result :

set.seed(1)  #for reproducability of the results

# The data
X <- rnorm(100000)
Y <- as.factor(sample(letters[1:5],100000,replace=T))
Z <- as.factor(sample(letters[1:10],100000,replace=T))

# the function forloop that averages X over every combination of Y and Z
forloop <- function(x,y,z){
# These ones are for optimization, so the functions 
#levels() and length() don't have to be called more than once.
  ylev <- levels(y)
  zlev <- levels(z)
  n <- length(ylev)
  p <- length(zlev)

  out <- matrix(NA,ncol=p,nrow=n)
  for(i in 1:n){
      for(j in 1:p){
          out[i,j] <- (mean(x[y==ylev[i] & z==zlev[j]]))
      }
  }
  rownames(out) <- ylev
  colnames(out) <- zlev
  return(out)
}

# Used on the generated data
forloop(X,Y,Z)

# The same using tapply
tapply(X,list(Y,Z),mean)

Both give exactly the same result, being a 5 x 10 matrix with the averages and named rows and columns. But :

> system.time(forloop(X,Y,Z))
   user  system elapsed 
   0.94    0.02    0.95 

> system.time(tapply(X,list(Y,Z),mean))
   user  system elapsed 
   0.06    0.00    0.06 

There you go. What did I win? ;-)

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • 1
    aah, so sweet :-) I was actually wondering if anybody would ever come across my rather late answer. – Joris Meys Aug 27 '10 at 15:20
  • 1
    I always sort by "active". :) Not sure how to generalize your answer; sometimes `*apply` is faster. But I think that the more important point is the *side effects* (updated my answer with an example). – Shane Aug 30 '10 at 18:25
  • 1
    I think that apply is especially faster when you want to apply a function over different subsets. If there is a smart apply solution for a nested loop, I guess the apply solution will be faster too. In most cases apply doesn't gain much speed I guess, but I definitely agree on the side effects. – Joris Meys Aug 30 '10 at 21:57
  • 2
    This is a little off topic, but for this specific example, `data.table` is even faster and I think "easier". `library(data.table)` `dt<-data.table(X,Y,Z,key=c("Y,Z"))` `system.time(dt[,list(X_mean=mean(X)),by=c("Y,Z")])` – dnlbrky Feb 22 '13 at 04:01
  • Can someone explain why tapply is faster than the for-loop? – zhanxw Jun 23 '15 at 04:17
  • 13
    This comparison is absurd. `tapply` is a specialized function for a specific task, **that's** why it's faster than a for loop. It can't do what a for loop can do (while regular `apply` can). You're comparing apples with oranges. – eddi May 19 '16 at 19:25
  • 1
    @eddi it appears you (and 12 others) completely missed the entire point of the example. You're just stating what I showed there 6 years earlier. being `tapply` is suited for a task that might be done with nested for loops, and hence performs this a lot faster than a naive approach that does not use the tools made for the task. – Joris Meys Apr 07 '20 at 14:55
  • `tapply` is basically `split` followed by an `lapply`. `lapply` is not any faster than a for loop, so assuming your for loop code is optimal (I didn't check) any speedup you get from `tapply` is from `split`. So all this answer does then is just show that `split` has a fast C implementation, and is not something I would categorize as an "apply family speedup". This is why this is apple vs oranges - at best you're comparing `split` vs for loop. – eddi Apr 08 '20 at 18:08
  • 2
    @eddi if people would generally use split() before running through the resulting list in a single `for` loop, I agree. But people don't. They either use two nested for loops, or use `tapply` (or more modern approaches) if they're more familiar with R. So I check performance at user level, not internally. In any case, I thought the "tongue in cheek" was obvious with the last line of my very old answer. – Joris Meys Apr 09 '20 at 14:23
  • To back up @JorisMeys comment here, this is really what the argument is often about. "I don't want to have to think differently in R, I just want to use loops like I always did." This example demonstrates effectively that that is not a good way to use R. The specific looping function is pretty irrelevant, as most of this thread of answers demonstrates. (potential side effects of `for` notwithstanding) – John Mar 05 '23 at 15:25
48

...and as I just wrote elsewhere, vapply is your friend! ...it's like sapply, but you also specify the return value type which makes it much faster.

foo <- function(x) x+1
y <- numeric(1e6)

system.time({z <- numeric(1e6); for(i in y) z[i] <- foo(i)})
#   user  system elapsed 
#   3.54    0.00    3.53 
system.time(z <- lapply(y, foo))
#   user  system elapsed 
#   2.89    0.00    2.91 
system.time(z <- vapply(y, foo, numeric(1)))
#   user  system elapsed 
#   1.35    0.00    1.36 

Jan. 1, 2020 update:

system.time({z1 <- numeric(1e6); for(i in seq_along(y)) z1[i] <- foo(y[i])})
#   user  system elapsed 
#   0.52    0.00    0.53 
system.time(z <- lapply(y, foo))
#   user  system elapsed 
#   0.72    0.00    0.72 
system.time(z3 <- vapply(y, foo, numeric(1)))
#   user  system elapsed 
#    0.7     0.0     0.7 
identical(z1, z3)
# [1] TRUE
Cole
  • 11,130
  • 1
  • 9
  • 24
Tommy
  • 39,997
  • 12
  • 90
  • 85
  • 1
    The original findings no longer appear to be true. ```for``` loops are faster on my Windows 10, 2-core computer. I did this with ```5e6``` elements - a loop was 2.9 seconds vs. 3.1 seconds for ```vapply```. – Cole Jan 01 '20 at 16:05
31

I've written elsewhere that an example like Shane's doesn't really stress the difference in performance among the various kinds of looping syntax because the time is all spent within the function rather than actually stressing the loop. Furthermore, the code unfairly compares a for loop with no memory with apply family functions that return a value. Here's a slightly different example that emphasizes the point.

foo <- function(x) {
   x <- x+1
 }
y <- numeric(1e6)
system.time({z <- numeric(1e6); for(i in y) z[i] <- foo(i)})
#   user  system elapsed 
#  4.967   0.049   7.293 
system.time(z <- sapply(y, foo))
#   user  system elapsed 
#  5.256   0.134   7.965 
system.time(z <- lapply(y, foo))
#   user  system elapsed 
#  2.179   0.126   3.301 

If you plan to save the result then apply family functions can be much more than syntactic sugar.

(the simple unlist of z is only 0.2s so the lapply is much faster. Initializing the z in the for loop is quite fast because I'm giving the average of the last 5 of 6 runs so moving that outside the system.time would hardly affect things)

One more thing to note though is that there is another reason to use apply family functions independent of their performance, clarity, or lack of side effects. A for loop typically promotes putting as much as possible within the loop. This is because each loop requires setup of variables to store information (among other possible operations). Apply statements tend to be biased the other way. Often times you want to perform multiple operations on your data, several of which can be vectorized but some might not be able to be. In R, unlike other languages, it is best to separate those operations out and run the ones that are not vectorized in an apply statement (or vectorized version of the function) and the ones that are vectorized as true vector operations. This often speeds up performance tremendously.

Taking Joris Meys example where he replaces a traditional for loop with a handy R function we can use it to show the efficiency of writing code in a more R friendly manner for a similar speedup without the specialized function.

set.seed(1)  #for reproducability of the results

# The data - copied from Joris Meys answer
X <- rnorm(100000)
Y <- as.factor(sample(letters[1:5],100000,replace=T))
Z <- as.factor(sample(letters[1:10],100000,replace=T))

# an R way to generate tapply functionality that is fast and 
# shows more general principles about fast R coding
YZ <- interaction(Y, Z)
XS <- split(X, YZ)
m <- vapply(XS, mean, numeric(1))
m <- matrix(m, nrow = length(levels(Y)))
rownames(m) <- levels(Y)
colnames(m) <- levels(Z)
m

This winds up being much faster than the for loop and just a little slower than the built in optimized tapply function. It's not because vapply is so much faster than for but because it is only performing one operation in each iteration of the loop. In this code everything else is vectorized. In Joris Meys traditional for loop many (7?) operations are occurring in each iteration and there's quite a bit of setup just for it to execute. Note also how much more compact this is than the for version.

Frank
  • 66,179
  • 8
  • 96
  • 180
John
  • 23,360
  • 7
  • 57
  • 83
  • 4
    But Shane's example is realistic in that most of the time __is__ usually spent in the function, not in the loop. – hadley Feb 03 '11 at 14:29
  • 9
    speak for yourself... :)... Maybe Shane's is realistic in a certain sense but in that same sense the analysis is utterly useless. People will care about the speed of the iteration mechanism when they have to do a lot of iterations, otherwise their problems are elsewhere anyway. It's true of any function. If I write a sin that takes 0.001s and someone else writes one that takes 0.002 who cares?? Well, as soon as you have to do a bunch of them you care. – John Feb 03 '11 at 15:29
  • 2
    on a 12 core 3Ghz intel Xeon, 64bit, I get quite different numbers to you - the for loop improves considerably: for your three tests, I get `2.798 0.003 2.803; 4.908 0.020 4.934; 1.498 0.025 1.528`, and vapply is even better: `1.19 0.00 1.19` – naught101 Jun 26 '12 at 07:11
  • 2
    It does vary with OS and R version... and in an absolute sense CPU. I just ran with 2.15.2 on Mac and got `sapply` 50% slower than `for` and `lapply` twice as fast. – John Jan 13 '13 at 00:08
  • The relative findings of 2.15.2 seem to maintain with 3.1.1. – John Jul 13 '14 at 23:42
  • 1
    In your example, you mean to set `y` to `1:1e6`, not `numeric(1e6)` (a vector of zeroes). Trying to allocate `foo(0)` to `z[0]` over and over does not illustrate well a typical `for` loop usage. The message is otherwise spot on. – flodel Dec 01 '16 at 04:33
  • @flodel I'm not trying to illustrate typical for loop usage. I'm trying to isolate the overhead of a call in a loop in the simplest case scenario. I don't want any other possible intervening alternative explanations for the result. – John Aug 16 '17 at 15:45
  • 1
    R has had substantial performance increases since this was written, most notably for this a JIT compiler. Running on R 3.5.4, I get the `for` loop as the fastest, `sapply` 40% slower, and `lapply` 20% slower. – Gregor Thomas Apr 23 '19 at 02:35
  • @Gregor, interesting, I just ran the first example today on 3.5.3 and everything was 10x faster and all pretty similar but lapply was still fastest. This is on a MacBook Pro last gen. (0.7, 0.6, 0.5) – John Apr 24 '19 at 20:02
  • Hmm, I'm surprised. I made some modifications so that the outputs are the same: `for_loop = {z <- integer(n); for(i in 1:n) z[i] = foo(y[i])}` (I think flodel has a good point above about running `z[0] <- foo(0)` n times in the for loop != `z <- sapply(y, foo)`), and I wrapped the `lapply` in `unlist()` so that the result is the same. Doing that, and then using microbenchmark I show the `for` loop as more than 2x faster than `lapply`, with `sapply` a little bit slower than `lapply`. I added `vapply` too, it's about 30% slower than the loop. – Gregor Thomas Apr 25 '19 at 14:26
  • Under 3.6.0 I now replicate that `lapply` is slowest and about half the speed of `for`. Unfortunately, it's not because `for` is so much faster (about 0.5) but `lapply` got slower (about 0.9). I don't think that's an improvement overall. My 3.5.3 results from Apr. 24 were on average the best. – John May 14 '19 at 02:39
5

When applying functions over subsets of a vector, tapply can be pretty faster than a for loop. Example:

df <- data.frame(id = rep(letters[1:10], 100000),
                 value = rnorm(1000000))

f1 <- function(x)
  tapply(x$value, x$id, sum)

f2 <- function(x){
  res <- 0
  for(i in seq_along(l <- unique(x$id)))
    res[i] <- sum(x$value[x$id == l[i]])
  names(res) <- l
  res
}            

library(microbenchmark)

> microbenchmark(f1(df), f2(df), times=100)
Unit: milliseconds
   expr      min       lq   median       uq      max neval
 f1(df) 28.02612 28.28589 28.46822 29.20458 32.54656   100
 f2(df) 38.02241 41.42277 41.80008 42.05954 45.94273   100

apply, however, in most situation doesn't provide any speed increase, and in some cases can be even lot slower:

mat <- matrix(rnorm(1000000), nrow=1000)

f3 <- function(x)
  apply(x, 2, sum)

f4 <- function(x){
  res <- 0
  for(i in 1:ncol(x))
    res[i] <- sum(x[,i])
  res
}

> microbenchmark(f3(mat), f4(mat), times=100)
Unit: milliseconds
    expr      min       lq   median       uq      max neval
 f3(mat) 14.87594 15.44183 15.87897 17.93040 19.14975   100
 f4(mat) 12.01614 12.19718 12.40003 15.00919 40.59100   100

But for these situations we've got colSums and rowSums:

f5 <- function(x)
  colSums(x) 

> microbenchmark(f5(mat), times=100)
Unit: milliseconds
    expr      min       lq   median       uq      max neval
 f5(mat) 1.362388 1.405203 1.413702 1.434388 1.992909   100
Richard Border
  • 3,209
  • 16
  • 30
Michele
  • 8,563
  • 6
  • 45
  • 72
  • 8
    It is important to notice that (for small pieces of code) `microbenchmark` it is much more precise than `system.time`. If you try to compare `system.time(f3(mat))` and `system.time(f4(mat))` you'll get different result almost each time. Sometimes only a proper benchmark test is able to show the fastest function. – Michele Apr 10 '13 at 17:57