R for loop vs lapply (performance)

Question

Which one is faster and why? If there is any - maybe it depends of the data and functions we are using. If so, how? I've checked on some examples:

lista <- list(a=1:100, b=-20:500,c=300:1000,rep(1000,1000))
for(i in 1:10){ lista <- c(lista,lista)} # length==4096

To compare I wrote this function

loopfor <- function(x, fun){
    ret <- vector("list",length(x))
    for (i in seq_along(x)) { 
        ret[[i]] <- fun(x[[i]]) 
    }
    return(ret)
}

lapplyfun <- function(x, fun){
    ret <- lapply(x, fun)
    return(ret)
}

loopfor vs lapplyfun call For sum function lapply is the winner

    require(microbenchmark)    
    microbenchmark(loopfor(lista,sum), lapplyfun(lista,sum),times=100)
Unit: milliseconds
                  expr       min        lq    median        uq      max neval
   loopfor(lista, sum) 20.496391 21.058436 21.423077 22.309260 50.80541   100
 lapplyfun(lista, sum)  8.745445  9.007782  9.342844  9.777506 15.15932   100

but for more complex function like summary the difference is really small

    microbenchmark(loopfor(lista,summary), lapplyfun(lista,summary),times=10)
Unit: seconds
                      expr      min       lq   median       uq      max neval
   loopfor(lista, summary) 2.147071 2.164275 2.186433 2.228169 2.342094    10
 lapplyfun(lista, summary) 2.024157 2.099712 2.198469 2.314902 2.550751    10

Any explanation, ideas? Maybe loopfor should be written differently to increase performance? :)

I'm confused. Your function takes an argument x, and then appears to operate directly on lista. And then you're comparing summary to sum? — joran, Apr 10 '14 at 21:44
About the first one you are right. About second - not :) In example with summary I changed the implementation of `loopfor` :) — bartektartanus, Apr 10 '14 at 21:49
You should include the modified version of the function then. Your for loop version is spending time creating the results list. If you wrap the lapply version in a function and actually assign the result, which forces R to create the new object, I get a much smaller difference between the two. — joran, Apr 10 '14 at 22:01
The first one's in `ms` and the second is in seconds. The median difference in both cases are: `12.35 ms` and `9.34 ms` respectively. — Arun, Apr 10 '14 at 22:20

R for loop vs lapply (performance)

0 Answers0