0

I am trying to create a sparse matrix of 300kx300k but i have been running into memory problem in r, I am using an 8gb windows 10 laptop, getting a computer with a bigger ram is gonna be difficult at this time, what can i do to create this sparse matrix efficiently without memory allotment error

I have gone through bigmemory package but i couldnt get a hold of it cos it can't handle sparse matrix

This is a smaller dataset, i dont have any problem computing this but the problem arises when i try to compute the 300k x 300k matrix

fam <- structure(list(ID = c(1L, 2L, 3L, 4L, 6L, 5L, 7L), dad = c(0L, 
                                                                  0L, 1L, 1L, 1L, 3L, 5L), mum = c(0L, 0L, 0L, 2L, 4L, 4L, 6L), 
                      GEN = c(1L, 1L, 2L, 2L, 3L, 3L, 4L)), class = "data.frame", row.names = c(NA, 
                                                                                                -7L))
library(Matrix)

hom = function(fam) {
  t1 <- min(which.max(fam$dad > 0), which.max(fam$mum > 0))
  t2 <- max(fam[["ID"]])


  A<-Matrix(0, nrow=t2,ncol=t2, sparse=TRUE)
  diag(A) <- 2-0.5^(fam[["GEN"]]-1)

  for (t in t1:t2) {
    A[t,t]<- sum(c(A[t,t],     0.5^(fam[t,"GEN"])*A[fam[t,"dad"],fam[t,"mum"]]))
    for(j in 1:(t-1))  {
      A[t,j]<- 0.5 * sum(c(A[j,fam[t,"dad"]],A[j,fam[t,"mum"]]))
      A[upper.tri(A)] <- t(A)[upper.tri(A)]
    }
  }
  A
}

I want to be able to create this sparse matrix efficiently without consuming a lot of memory but i am faced with this error:

Error: cannot allocate vector of size 300Gb 

What can i do, please?

Victor
  • 51
  • 6
  • You haven't shown us the code to create the big matrix. The 7x7 matrix you did create isn't sparse. – user2554330 Aug 11 '19 at 16:32
  • I'd get rid of the ```upper.tri``` line and make it your old one. ```A[j, t] <- A[t,j]``` is less intensive than the ```upper.tri``` version. Not that it helps your 300 GB allocation issue. – Cole Aug 11 '19 at 18:19
  • @user2554330 I agree that the resulting matrix isn't sparse. Here's the original post https://stackoverflow.com/questions/57301390/how-to-make-r-foreach-loops-efficient/57335242?noredirect=1#comment101371145_57335242 – Cole Aug 11 '19 at 18:23

1 Answers1

0

When you call upper.tri(A) you lose the sparseness in the index. You can do the same calculation as

A[upper.tri(A)] <- t(A)[upper.tri(A)]

but maintain sparseness throughout using the expression

A <- tril(A) + t(tril(A, -1))
user2554330
  • 37,248
  • 4
  • 43
  • 90