4
> system.time(sapply(rnorm(1000000,0,1), function (x) round(x,2)))
   user  system elapsed 
   2.78    0.11    2.89 
> system.time(round(rnorm(1000000,0,1),2))
   user  system elapsed 
   0.29    0.00    0.30 

I was trying this out after reading the answers to the R tips question. I did not expect sapply to be order of magnitude slower than the equivalent composite function in the above case. Does anyone know why this is the case? If i understand correctly sapply will vectorize and be near optimally fast.

Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73
mcheema
  • 850
  • 9
  • 25
  • 2
    Kohske is right. sapply merely creates an illusion or a poor substitute for real vectorization. When you can, you should try to build all of your transformations with inherently vectorized functions. – IRTFM Jan 05 '11 at 03:17
  • 2
    The main purpose of `sapply` is to make loops easier to read (and save typing), not to speed things up. – Richie Cotton Jan 05 '11 at 14:35
  • See also : http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar – Joris Meys May 19 '11 at 08:47

2 Answers2

5

probably sapply, which is a simple wrapper of lapply, is not vectorized. try this code:

system.time(sapply(rnorm(10), function (x) {print(length(x)); round(x,2)}))

and see the implementation here: https://svn.r-project.org/R/trunk/src/main/apply.c

kohske
  • 65,572
  • 8
  • 165
  • 155
2

There's nothing here to sapply to - you only give it a single vector - not a list of vectors, and sapply converts the result to a (single column) matrix.

sapply is simplifying the result for you, but in doing so has to generate an array.

Compare if you give it a list:

system.time(sapply(list(rnorm(1000000,0,1)), function (x) round(x,2))) 
user  system elapsed 
 0.22    0.00    0.22 

system.time(sapply(rnorm(1000000,0,1), function (x) round(x,2))) 
user  system elapsed 
4.21    0.00    4.21 
mdsumner
  • 29,099
  • 6
  • 83
  • 91
  • 1
    `sapply` calls `lapply` which operates element-wise on vectors. – hadley Jan 05 '11 at 04:43
  • thanks Hadley, I wasn't sure exactly what happens here - clearly it's the s(apply)implification here that takes the extra time though? – mdsumner Jan 05 '11 at 08:13
  • 2
    Hi, the first (list) case is like: for(i=1)round(x[[i]]), where x is list(rnorm(100000)). it is very similar to calling round(rnorm(100000)). the second case is like: for(i=1:100000)round(x[i]), where x is rnorm(100000). the extra time is not due to the simplification, but the loop. – kohske Jan 05 '11 at 08:34
  • Thanks to all the commenters and answers as I clearly did not understand sapply and will have to look into vectorization more deeply. – mcheema Jan 05 '11 at 19:42