2

I have 100,000 sparse matrices("dgCMatrix") store in a list object. The row number of every matrix is the same(8,000,000) and the size of the list is approximately 25 Gb. Now when I do:

do.call(cbind, theListofMatrices)

to combine all matrices into one big sparse matrix, I got "node stack overflow". Actually, I can't even do this with only 500 elements out of that list, which should output a sparse matrix with a size of only 100 Mb.

My speculation for this is that the cbind() function transformed the sparse matrix to a normal dense matrix and thus cause the stack overflow?

Actually, I have tried something like this:

tmp = do.call(cbind, theListofMatrices[1:400])

this works fine, and tmp is still a sparse matrix with a size of 95 Mb, and then I tried:

> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) : 
  node stack overflow

and then the error occurred. However, I am having no trouble doing something like:

cbind(tmp, tmp, tmp, tmp)

thus, I believe it has something to do with do.call()

Reduce() seems to solve my problem, though I still don't know the reason why do.call() crushes.

Eric He
  • 476
  • 2
  • 4
  • 13
  • 1
    This reminded me of this question http://stackoverflow.com/questions/23035982/directly-creating-dummy-variable-set-in-a-sparse-matrix-in-r . Perhaps something in there that can help – user20650 Jun 02 '16 at 02:13
  • @user20650 Did you figure out the issue that gives you the error in that post? – Eric He Jun 02 '16 at 04:58
  • 1
    Well, I actually never reported the issue. But from [Ben's comment](http://stackoverflow.com/questions/23035982/directly-creating-dummy-variable-set-in-a-sparse-matrix-in-r#comment35209562_23044219) you could try `Reduce`, or you could try building it out of its `i`, `j`, `x` components, re Flodels – user20650 Jun 02 '16 at 10:36
  • Some additional info may help people suggest alternatives: How many columns does each matrix have? Approximately, how sparse? How much ram are you working with? – user20650 Jun 02 '16 at 10:38
  • you should add the `Reduce` solution as an answer, so it is easier for others to find. – user20650 Jun 03 '16 at 11:27
  • 1
    oh, and from looking a R-devel (https://r-forge.r-project.org/tracker/?func=detail&atid=294&aid=6325&group_id=61 and https://stat.ethz.ch/pipermail/r-devel/2016-May/072682.html it is known – user20650 Jun 03 '16 at 11:35

2 Answers2

2

The problem is not in do.call() but due to the way cbind from the Matrix package is implemented. It uses recursion to bind the individual arguments together. For instance, Matrix::cbind(mat1, mat2, mat3) is translated to something along the lines of Matrix::cbind(mat1, Matrix::cbind(mat2, mat3)). Since do.call(cbind, theListofMatrices) is basically cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...) you have too many arguments to the cbind function and you will end up with a recursion that's nested too deeply and it will fail.

Thus, Ben's comment to use Reduce() is a good way to work around that issue since it avoids the recursion and replaces it with an iteration:

tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])
David K
  • 78
  • 6
0

In R: a 2-column matrix can have up to 2^30-1 rows = 1073,741,823 rows. So, I would check the row number and check the RAM size to make sure it can accommodate the big matrix size.

Noha Elprince
  • 1,924
  • 1
  • 16
  • 10
  • that's a _physical matrix_'s limitations; such limitations should not apply to a sparse matrix, unless it is converted to be dense somewhere along the line (though _printing_ the object may induce an error) – MichaelChirico Feb 27 '18 at 07:50