reason for faster matrix allocation in R

Question

Posting Best way to allocate matrix in R, NULL vs NA? shows that writing your own matrix allocation function in R can be 8 to 10 times faster than using R's built-in matrix() function to pre-allocate a large matrix.

Does anyone know why the hand crafted function is so much faster? What is R doing inside matrix() that is so slow? Thanks.

Here's the code on my system:

create.matrix <- function( nrow, ncol ) {
x<-matrix()
length(x) <- nrow*ncol
dim(x) <- c(nrow,ncol)
x
}

system.time( x <- matrix(nrow=10000, ncol=9999) )
user  system elapsed 
1.989   0.136   2.127 

system.time( y <- create.matrix( 10000, 9999 ) )
user  system elapsed 
0.192   0.141   0.332 
identical(x,y)
[1] TRUE

I appologize to those who commented thinking that the user-defined function was slower, since what is posted in the answer in the above link is inconsistent. I was looking at the user time, which is about 8 times faster in the above link, and on my system about 10 times faster for the user-defined vs built-in.

In response to Joshua's request for session info:

> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.12.1

Also, I tried running Simon's three examples, and the third example that Simon gives as the fastest, turns out for me the slowest:

> system.time(matrix(NA, nrow=10000, ncol=9999)) 
   user  system elapsed 
  2.011   0.159   2.171 
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x}) 
   user  system elapsed 
  0.194   0.137   0.330 
> system.time(matrix(logical(0), nrow=10000, ncol=9999)) 
   user  system elapsed 
  4.180   0.200   4.385

I still think however that Simon may be on the right track with the idea that matrix() initially allocates a 1x1 matrix and then copies it. Anyone know of any good documentation on R internals? Thanks.

I'm not sure I follow. Isn't the custom function _slower_ in that question (and even then, by a factor of ~3)? And as noted in a comment there, it will always be far faster to use vectorization. — joran, Aug 31 '12 at 17:44
In the accepted answer I see that the user-defined function is 3x slower than just using `matrix`. Can you provide an example of a custom function that is 8-10x faster than `matrix`? — Joshua Ulrich, Aug 31 '12 at 17:45
Also note in the link that the 2 objects have different storage modes (vector(probably logical) vs. list), so while they are both matricies they differ greatly in what can be stored in them and the overhead that goes with that. — Greg Snow, Aug 31 '12 at 18:09
What is your `sessionInfo`? The `create.matrix` function is only marginally faster (~15%) with R-2.15.1 on WinXP 32-bit, but I get similar timings with R-2.15.1 on Ubuntu 64-bit. — Joshua Ulrich, Aug 31 '12 at 20:19
I don't know why you still think Simon is on the right track. I've explained in my comments on his answer where and why I think it is off track. But waiting for Simon to confirm. — Matt Dowle, Sep 05 '12 at 14:39

Simon Urbanek · Accepted Answer · 2012-09-03T03:28:31.747

8

The problem is that your matrix call is a bit more complicated than you think it is. Compare the following versions:

# copy NA matrix
> system.time(matrix(NA, nrow=10000, ncol=9999))
   user  system elapsed 
  1.272   0.224   1.496 

# replicate NA vector (faster version of what you used)
> system.time({x=NA; length(x)=99990000; dim(x)=c(10000,9999); x})
   user  system elapsed 
  0.292   0.260   0.552 

# fastest - just allocate a matrix filled with NAs 
> system.time(matrix(logical(0), nrow=10000, ncol=9999))
   user  system elapsed 
  0.184   0.308   0.495

So in your example you were essentially creating a 1 x 1 NA matrix which was replicated to the size you specified - the slowest approach. Doing the same with a vector is faster (because it doesn't need to use modulo for columns) - you did it in a bit complicated way (by creating a matrix, converting it into a vector and then back to a matrix), but the idea is the same. Finally, if you just use an empty vector, so the matrix will be simply filled with NAs what you wanted anyway and thus no extra work is needed (fastest).

EDIT An important note, though: Matthew's suggestion was correct although it was not involved (since the code he cited is the logical(0) case, not in the NA case). Inadvertently I was running R-devel in the above timings so timings in released R will vary.

edited Sep 03 '12 at 03:28

answered Sep 03 '12 at 03:06

Simon Urbanek

13,842
45
45

Thanks for changing do_matrix so fast. But I think you're using the fixed version in the 3rd case above, because `data` passed to `matrix()` is length 0, making `lendat` 0 in C do_matrix, so then the new more efficient code is then running. To check, running your timings in R 2.15.1 shows the 1st and 3rd running the same time, for me. – Matt Dowle Sep 03 '12 at 08:58
Or in other words, I agree my suggestion wasn't involved, but now you've involved it by setting `data=logical(0)` :-) – Matt Dowle Sep 03 '12 at 09:02
OK. Now we're getting somewhere. I still don't quite understand what's going on inside. Sounds like `matrix(nrow=10000,ncol=9999)` is the same as `matrix(data=NA,nrow=10000,ncol=9999)` and both are slow because they first create a single element matrix and then copy it (10000*9999) times. How is `matrix(logical(0),nrow=10000,ncol=9999)` different from those other two? What does `logical(0)` indicate that NA does not? Thanks. – Daniel Goldfarb Sep 03 '12 at 14:48
Hmmm. I just got back to my system today, and the one that you say is fastest (i.e. with the logical(0)) is slowest for me. Takes 4 seconds! – Daniel Goldfarb Sep 04 '12 at 17:42
@Daniel I've already explained that in my comments above. Waiting for Simon to confirm. – Matt Dowle Sep 05 '12 at 14:37
@SimonUrbanek Have added proposal for follow up fix to my answer. Would that work? – Matt Dowle Sep 06 '12 at 21:03

IRTFM · Answer 2 · 2012-08-31T19:58:45.043

5

I'm going to take issue with the comments, although I do understand most of them. The problem is that the referenced posting has an answer that has internal contradictions that the commenters have been relying on without checking. The timing for user and system do not add up correctly to elapsed as they should.

 create.matrix <- function(size) {
  x <- matrix()
  length(x) <- size^2
  dim(x) <- c(size,size)
  x
  }
  system.time(x <- matrix(data=NA,nrow=10000,ncol=10000))
#   user  system elapsed 
#  0.464   0.226   0.688 
 system.time(y <- create.matrix(size=10000))
#   user  system elapsed 
#  0.177   0.239   0.414

I suspect that the efficiency is actually being achieved by the fact that the user-defined function is only capable of creating a square matrix and 'matrix' needs to do checks on validity of arguments for a more general situation.

EDIT: I see you have disproven one of my hypotheses (regarding the square matrix limitation) and I will also note that my other hypothesis that this was somehow due to lazy evaluation also failed my tests. The discrepancy really does not make sense because the user-code uses the matrix function.

edited Aug 31 '12 at 19:58

answered Aug 31 '12 at 19:00

IRTFM

258,963
21
364
487

I also take exception with the comments, especially where they say matrix is faster than the user defined function. However I appologize for not looking at the timings in the posting more carefully (and I'm talking about the posing answer, not the posting itself). You're correct they don't add up. Here's what it looks like on my system, and with a NON-square matrix: (next comment). – Daniel Goldfarb Aug 31 '12 at 19:17
'create.matrix <- function( nrow, ncol ) { x<-matrix() length(x) <- nrow*ncol dim(x) <- c(nrow,ncol) x } > system.time( x <- matrix(nrow=10000, ncol=9999) ) user system elapsed 1.989 0.136 2.127 > system.time( y <- create.matrix( 10000, 9999 ) ) user system elapsed 0.192 0.141 0.332 > identical(x,y) [1] TRUE ' – Daniel Goldfarb Aug 31 '12 at 19:19
@user1639359: it would be better if you added those examples to your question (like I requested in my comment). – Joshua Ulrich Aug 31 '12 at 19:27
sorry, i thought i could format the code in the comment. i've added it to the question. thanks. --d. – Daniel Goldfarb Aug 31 '12 at 19:44
@user1639359 Curious. I will defend my comment only in the sense that it was phrased as expressing confusion, not certainty. (I also only get a ~15% difference in speed.) – joran Aug 31 '12 at 20:26
@joran - understood. my bad. as i said, i was focusing on the user time (since it was clearly the major component) and the system time (and i didn't even look at or realize that the "elapsed" time in the link was wacked and confusing). thanks. – Daniel Goldfarb Aug 31 '12 at 23:05

Matt Dowle · Answer 3 · 2012-09-06T21:06:22.443

Not sure this is the cause (could be a different inefficiency), but in do_matrix in src/array.c there is a type switch containing :

case LGLSXP :
    for (i = 0; i < nr; i++)
    for (j = 0; j < nc; j++)
        LOGICAL(ans)[i + j * NR] = NA_LOGICAL;

That looks to be page inefficient. Think it should be :

case LGLSXP :
    for (j = 0; j < nc; j++)
    for (i = 0; i < nr; i++)
        LOGICAL(ans)[i + j * NR] = NA_LOGICAL;

or more simply :

case LGLSXP :
    for (i = 0; i < nc*nr; i++)
        LOGICAL(ans)[i] = NA_LOGICAL;

( with some fine tuning required since NR is type R_xlen_t whilst i, nc and nr are type int ).

UPDATE:

After posting to r-devel :

Possible page inefficiency in do_matrix in array.c

Simon Urbanek has now committed a change to R. It now uses the single index approach above :

Latest live version of array.c

But as Simon says, and I covered myself for above, this doesn't appear to fix the particular issue raised by the question. A second, different, inefficiency needs to be found and fixed, too.

Here is perhaps what that follow up fix could be. This combines the page efficiency of the new code (now in R), but, switches to use that when matrix(data=NA) (R's default). This avoids the modulo in copyMatrix that Simon mentions in his answer, by avoiding copyMatrix in the NA case.

Currently, do_matrix in array.c has :

if(lendat) {
    if (isVector(vals))
        copyMatrix(ans, vals, byrow);
    else
        copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) { 
    // fill with NAs in the new page efficient way that Simon already committed.
}

which could be as follows. I'm not aware of an ISNA() function at C level that takes SEXP input, so have coded that long hand (Simon, is there a better way?) :

if(lendat && // but not a single NA, basically :
             !(lendat==1 &&
                  ((isLogical(vals) && LOGICAL(vals)[0] == NA_LOGICAL) ||
                   (isReal(vals) && ISNA(REAL(vals)[0])) ||
                   (isInteger(vals) && INTEGER(vals)[0] == NA_INTEGER)))) {
    if (isVector(vals))
        copyMatrix(ans, vals, byrow);
    else
        copyListMatrix(ans, vals, byrow);
} else if (isVector(vals)) { 
    // fill with NAs in the new page efficient way that Simon already committed.
    // this branch will now run when dat is a single NA, too
}

probably a lost cause, but have you thought about sending this to r-devel? (Or Radford Neal?) — Ben Bolker, Sep 03 '12 at 01:53

score 1 · Answer 4 · answered Aug 31 '12 at 21:17

Hmm. Yeah, it's weird. ...and this is slightly faster still - even though it's more like matrix() in that it allows for a single data argument (but it must be scalar):

create.matrix2 <- function(data=NA, nrow, ncol) {
  x <- rep.int(data[[1]], nrow*ncol)
  dim(x) <- c(nrow, ncol)
  x
}
system.time( x <- matrix(nrow=10000, ncol=9999) ) # 0.387 secs
system.time( y <- create.matrix(nrow=10000, ncol=9999) )  # 0.199 secs
system.time( z <- create.matrix2(nrow=10000, ncol=9999) ) # 0.173 secs
identical(x,z) # TRUE

...I guess the internal code for creating the matrix does something wasteful (or possibly useful, but I can't think of what that would be)...

Oh, since it handles data of any length, it might end up with something similar to rep(data, length.out=nrow*ncol) which is rather slow:

system.time( rep(NA, length.out=10000*9999) ) # 1.5 secs!

Anyway, definitely room for improvement!

reason for faster matrix allocation in R

4 Answers4