0

I started coding in R lately, and I read that the apply function is faster than a for loop.

Let's say I want to extract numbers from a vector and insert them into a list. Using a for loop this not a problem. However, I'm curious if this is also possible with an apply function and if that makes sense in any way. I had something like this in mind (which is not working):

some.list <- list()
some.vector <- 1:10
sapply(1:10,function(i){some.list[[i]] <- some.vector[i]})
josliber
  • 43,891
  • 12
  • 98
  • 133
Peanut
  • 803
  • 1
  • 11
  • 24
  • possible? yes. You can use `<<-` in place of `<-`. Faster? probably not. – Rorschach Oct 12 '15 at 23:17
  • If you want people to answer faster you should make a minimal reproducible example. To learn more check out [MRE](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Jeff Tilton Oct 12 '15 at 23:21
  • Where where you reading that exactly? That's really a bad myth. You should use vectorized functions when available, but often a well-written for loop is faster than apply() (the most important thing is to pre-allocate). – MrFlick Oct 12 '15 at 23:26
  • 1
    Of course `as.list(some.vector)` would work just fine here. Perhaps a more accurate/descriptive example of what you are really trying to do would be helpful. – MrFlick Oct 12 '15 at 23:28

1 Answers1

2

There are all sorts of different ways to create a list containing the elements of a vector (the one that I would always use would be as.list). You can use R benchmarking packages to test for yourself which is faster:

fun1 <- function(v) as.list(v)
fun2 <- function(v) {
  l <- vector("list", length(v))  # Thanks to @MrFlick for pre-allocation tip
  for (i in seq_along(v)) {
    l[[i]] <- v[i]
  }
  l
}
fun2a <- function(v) {
  l <- vector("list", length(v))  # Thanks to @MrFlick for pre-allocation tip
  sapply(seq_along(v), function(i) l[[i]] <<- v[i])
  l
}
fun3 <- function(v) lapply(v, identity)
fun3a <- function(v) sapply(v, identity, simplify=FALSE)
fun4 <- function(v) unname(split(v, seq_along(v)))

v <- 1:10000
# Check if all return same thing (see http://stackoverflow.com/a/30850654/3093387)
all(sapply(list(fun2(v), fun2a(v), fun3(v), fun3a(v), fun4(v)), identical, fun1(v)))
# [1] TRUE

library(microbenchmark)
microbenchmark(fun1(v), fun2(v), fun2a(v), fun3(v), fun3a(v), fun4(v))
# Unit: microseconds
#      expr       min         lq       mean    median         uq       max neval
#   fun1(v)   139.543   178.5015   283.7498   218.720   288.1555  3730.439   100
#   fun2(v)  6809.344  7465.1110  9326.7799  7912.763 10881.0305 16963.567   100
#  fun2a(v) 10790.471 13786.2335 15912.5338 15089.547 15787.3085 71504.328   100
#   fun3(v)  4132.854  4545.2085  6612.3504  4768.798  7947.0820 63608.519   100
#  fun3a(v)  4147.731  4537.0010  5887.4457  4805.952  7604.4250 13613.517   100
#   fun4(v)  3341.360  3508.2995  3798.4246  3599.220  3797.1200  7565.591   100

For a list of length 10000, as.list is about 15x faster than lapply, sapply with simplify=FALSE, or split. In turn these three options 2-3x faster than a for loop or sapply with a <<- (using pre-allocated output lists; it is about 75x slower if we don't pre-allocate). In short, sapply and for had similar runtimes (sapply actually appeared a bit slower), and both are much slower than vectorized functions for this operation.

josliber
  • 43,891
  • 12
  • 98
  • 133
  • you can add preallocation in the loops to help: `l <- vector("list",length(v))` – MrFlick Oct 13 '15 at 00:16
  • @MrFlick thanks -- I've updated to pre-allocate and thanked you in the comments! – josliber Oct 13 '15 at 00:21
  • @RichardScriven Slower compared to what? To the non-preallocated version? – MrFlick Oct 13 '15 at 00:24
  • Yeah, I guess the non-preallocated appears to be what the OP is using, but it's a more fair comparison to pre-allocate for the `for` and `sapply` with `<<-` options, since all the other approaches are pre-allocating. – josliber Oct 13 '15 at 00:24
  • 2
    @RichardScriven Did you compare in the same microbenchmark call (otherwise the units might be different)? It should be orders of magnitude slower: `funA <- function(v) {l <- list(); for (i in seq_along(v)) {l[[i]] <- v[i]}; l}; funB <- function(v) {l <- vector("list",length(v)); for (i in seq_along(v)) {l[[i]] <- v[i]}; l}; microbenchmark(funA(v), funB(v))` – MrFlick Oct 13 '15 at 00:28