0

I have a dataframe like df, with a dimension of 10,000 x 40,000 (this matrix has a lot of 0's):

value1 <- c(1, 0, 3, 0, 0, 2)
value2 <- c(0.8, 0.1, 9, 0, 0, 5) 
value3 <- c(8, 3, 0, 0, 0, 0)
df <- data_frame(value1, value2, value3)

I want to calculate the covariance matrix of df. I have tried to use bigcor() and I have also tried to calculate the covariance matrix of a sparse matrix (Running cor() (or any variant) over a sparse matrix in R).

However, R session aborts.

Any help?

vog
  • 770
  • 5
  • 11
  • What sort of efforts at sensible memory error management have you done? – IRTFM May 18 '22 at 05:47
  • Why do you need to store the covariance? Cant you calculate the covariance on the fly ie whenever needed? – Onyambu May 18 '22 at 06:22
  • @onyambu I need to calculate ```glasso()``` on the covariance matrix, a posterior. Do you think there exist a way to calculate the covariance on the fly and immediately after ```glasso()```? I only want to have the results from ```glasso()```actually. – vog May 18 '22 at 20:33
  • @IRTFM what do you mean? – vog May 18 '22 at 20:34
  • I do not know what `glasso()` mean. But I am quite confident that whatever you are doing is aproaching a big problem from an unsolvable angle. Whenever you need the variance/covariance of two variables, just select the two from the main df and compute their covariance then use that in whatever way you want. You cannot store a vector of 1.6Billion entries. Note that the entries will not be zero. so technically wont be sparse – Onyambu May 18 '22 at 20:38
  • @onyambu ```glasso()``` selects only those covariances that are relevant enough. It is 1.6Bn entries. But think that a cov.matrix is symmetric. So you "only" need the upper triangle – vog May 18 '22 at 21:28
  • There is no way to keep 800million entries in memory. Sorry I am unable to help – Onyambu May 18 '22 at 21:30
  • @onyambu Is there a way to save the matrix as text, table or some efficient format? – vog May 19 '22 at 04:12
  • But if you are going to save it out of memory, are you at any point going to read the whole matrix into memory? If so, then its of no help. I believe you can compute the correlation and store then in a file using for loop – Onyambu May 19 '22 at 06:39
  • 40k x 40k is only 13GB of memory. If it's possible, just provision more memory. Otherwise, perhaps chunk the input matrix and do 16 10k x 10k matrices - assuming whatever you're doing can be built piecewise (I don't recall what the graphical lasso is doing under the hood). – CJR May 19 '22 at 20:15
  • @CJR I tried to run ```bigcor()``` to run it by chunks. However, R still aborts. Do you have any idea on how to split the process 16 10k x 10k matrices? – vog May 19 '22 at 23:12

0 Answers0