16

I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object cannot handle that amount of cells, as I run out of memory.

My problem is that I need the inverse and also the Moore-Penrose generalized inverse of such matrices, that I cannot compute at the time being.

What I have tried:

  • solve yields an Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory) error
  • MASS::ginv is not compatible with the Matrix class
  • there is no direct way to convert a sparse Matrix to e.g. bigmemory::big.matrix, and that latter neither does work with MASS::ginv anyway
  • if I try to compute the Choleski factorization of the matrix to later call Matrix::chol2inv on that, I get the following error message:

    Error in .local(x, ...) : 
      internal_chm_factor: Cholesky factorization failed
    In addition: Warning message:
    In .local(x, ...) :
      Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 431
    
  • based on a related question, I also gave a try to the pbdDMAT package on a single node, but pbdDMAT::chol yielded an Cholmod error 'out of memory' at file ../Core/cholmod_memory.c, line 147 error message

Question in a nutshell: is there any way to compute the inverse and Moore-Penrose generalized inverse of such sparse matrix, beside falling back to using matrix class on a computer with tons of RAM?


Quick reproducible example:

library(Matrix)
n <- 1e5
m <- sparseMatrix(1:n, 1:n, x = 1)
m <- do.call(rBind, lapply(1:10, function(i) {
    set.seed(i)
    m[-sample(1:n, n/3), ]
}))
m <- t(m) %*% m

Some descriptives (thanks to Gabor Grothendieck):

> dim(m)
[1] 100000 100000
> sum(m) / prod(dim(m))
[1] 6.6667e-05
> table(rowSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 
> table(colSums(m))

    0     1     2     3     4     5     6     7     8     9    10 
    5    28   320  1622  5721 13563 22779 26011 19574  8676  1701 

And some error messages:

> Matrix::solve(m)
Error in LU.dgC(a) : cs_lu(A) failed: near-singular A (or out of memory)
> base::solve(m)
Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
> MASS::ginv(m)
Error in MASS::ginv(m) : 'X' must be a numeric or complex matrix

The goal of the bounty is to find a way to compute the Moore-Penrose generalized inverse of m with less than 8Gb of RAM (speed and performance is not important).

Community
  • 1
  • 1
daroczig
  • 28,004
  • 7
  • 90
  • 124
  • 1
    What do you need to do with the inverse once you compute it? http://mathoverflow.net/questions/72616/solving-for-moore-penrose-pseudo-inverse – Ben Bolker May 25 '14 at 14:03
  • Thank you @BenBolker, I try to implement Formula (22) on page 14 at http://www.scribd.com/doc/226039409#page=14 with large matrices. I would also appreciate any help on how I could save that computation, as computing `D_1` and `D^a_2` was extremely quick with `Matrix::sparseMatrix`, but I got stuck with the (pseduo) inverse. – daroczig May 25 '14 at 19:37
  • Scribd says it's a private document ... – Ben Bolker May 25 '14 at 19:38
  • @BenBolker I am terribly sorry for this lame issue, please try again, I've just updated the privacy settings. – daroczig May 25 '14 at 22:17
  • For some reason, your example `m` isn't creating square matricies. – James May 26 '14 at 11:18
  • Thank you @James, I have just fixed the example. The problem was that the number of rows/columns defaulted to the `max` of the generated sample. – daroczig May 26 '14 at 15:24
  • The example is also not reproducible unless `set.seed` is added. – G. Grothendieck May 26 '14 at 16:09

1 Answers1

8

If you have very few 1's then your matrix will likely have no more than one 1 in any column and in any row in which case the generalized inverse equals the transpose:

library(Matrix)
set.seed(123)
n <- 1e5
m <- sparseMatrix(sample(1:n, n/10), sample(1:n, n/10), x = 1, dims = c(n, n))

##############################################################################
# confirm that m has no more than one 1 in each row and column
##############################################################################
table(rowSums(m))  # here 90,000 rows are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

table(colSums(m))  # here 90,000 cols are all zero, 10,000 have a single one
##     0     1 
## 90000 10000 

##############################################################################
# calculate generalized inverse
##############################################################################
minv <- t(m)

##############################################################################
# check that when multiplied by minv it gives a diagonal matrix of 0's and 1's
##############################################################################
mm <- m %*% minv

table(diag(mm)) # diagonal has 90,000 zeros and 10,000 ones
##     0     1 
## 90000 10000 

diag(mm) <- 0
range(mm) # off diagonals are all zero
## [1] 0 0

REVISED Improved presentation.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Awesome, thank you very much @G-Grothendieck! Unfortunately `m` has several `1`s in some columns/rows, despite the fact that it's really sparse: `sum(m) / prod(dim(m))` returns `0.0007337982` on my matrix. I try to come up with a better example. – daroczig May 26 '14 at 19:49
  • What is `table(rowSums(m))` and `table(colSums(m))` and `dim(m)`? – G. Grothendieck May 26 '14 at 19:52
  • @G-Grothendieck pls find the details at https://gist.github.com/daroczig/9bc956ce8e1cb210234a until I come up with a real minimal reproducible example. – daroczig May 26 '14 at 20:00
  • 1
    This seems denser than I thought but you could reduce the problem size like this: ix <- union(which(rowSums(m) > 0), which(colSums(m) > 0)); minv <- m; minv[ix, ix] <- f(m[ix, ix])` where f is a generalized inverse function. Seems `ginv` won't work but if you can find another one maybe that will reduce it enough to help. – G. Grothendieck May 26 '14 at 21:19
  • 1
    Awesome, thank you very much, @G-Grothendieck! I will leave the bounty open for a few further days, but I am pretty sure that the offered reputation points will end up on your account. Thanks again! – daroczig May 27 '14 at 13:15