Posting Best way to allocate matrix in R, NULL vs NA? shows that writing your own matrix allocation function in R can be 8 to 10 times faster than using R's built-in matrix() function to pre-allocate a large matrix.
Does anyone know why the hand crafted function is so much faster? What is R doing inside matrix() that is so slow? Thanks.
Here's the code on my system:
create.matrix <- function( nrow, ncol ) {
x<-matrix()
length(x) <- nrow*ncol
dim(x) <- c(nrow,ncol)
x
}
system.time( x <- matrix(nrow=10000, ncol=9999) )
user system elapsed
1.989 0.136 2.127
system.time( y <- create.matrix( 10000, 9999 ) )
user system elapsed
0.192 0.141 0.332
identical(x,y)
[1] TRUE
I appologize to those who commented thinking that the user-defined function was slower, since what is posted in the answer in the above link is inconsistent. I was looking at the user time, which is about 8 times faster in the above link, and on my system about 10 times faster for the user-defined vs built-in.
In response to Joshua's request for session info:
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.12.1
Also, I tried running Simon's three examples, and the third example that Simon gives as the fastest, turns out for me the slowest:
> system.time(matrix(NA, nrow=10000, ncol=9999))
user system elapsed
2.011 0.159 2.171
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x})
user system elapsed
0.194 0.137 0.330
> system.time(matrix(logical(0), nrow=10000, ncol=9999))
user system elapsed
4.180 0.200 4.385
I still think however that Simon may be on the right track with the idea that matrix()
initially allocates a 1x1 matrix and then copies it. Anyone know of any good documentation on R internals? Thanks.