1

I am working with a big shared memory matrix of 1.3e6x1.3e6 in a foreach loop. I create that matrix with FBM function of bigstatsr package. I need the results of the loop in the FBM class object to not run out of RAM memory. This is what I want to do without FBM class object.

library(doParallel)
 library(foreach)
 library("doFuture")

 cl=makeCluster(2)
 registerDoParallel(cl
                    )
 registerDoFuture()
 plan(multicore)

 results=foreach(a=1:4,.combine='cbind') %dopar% {
   a=a-1
   foreach(b=1:2,.combine='c') %dopar% {
     return(10*a + b)
   }
 } 

And this is how I try it

library(bigstatsr)

 results=FBM(4,4,init=0)
 saveinFBM=function(x,j){results[,j]=x}

 foreach(a=1:4,.combine='savinFBM') %dopar% {
   a=a-1
   foreach(b=1:2,.combine='c') %dopar% {
     return(10*a + b)
   }
 } 
Error in get(as.character(FUN), mode = "function", envir = envir) : 
  object 'savinFBM' of mode 'function' was not found

PS: Could anybody add the tag "dofuture"?

LauC
  • 55
  • 5
  • Basically, you want to fill the matrix with some values based on two index and column index? You should `big_apply()`. Do you really have a 1.3e6x1.3e6 matrix? That's a lot of data that will take a lot of time to process (just to access it). – F. Privé Oct 05 '18 at 19:57

1 Answers1

0

If I understand correctly what you want to do, a faster alternative is using outer(1:2, 1:4, function(b, a) 10 * (a - 1) + b).

If you want to fill an FBM with this function, you can do:

library(bigstatsr)
X <- FBM(200, 400)
big_apply(X, a.FUN = function(X, ind) {
  X[, ind] <- outer(rows_along(X), ind, function(b, a) 10 * (a - 1) + b)
  NULL
})

Usually, using parallelism won't help when you write data on disk (what you do when you fill X[, ind]), but it you really want to try, you can use ncores = nb_cores() as additional argument of big_apply().

F. Privé
  • 11,423
  • 2
  • 27
  • 78
  • the thing is that I want to to use parallelism to calculate "something pretty large" instead of "10*(a-1)+b" and I want to write the result in a FBM. I need parallelization to calculate it and I need FBM to store the results. Tahnk you very much. I will try it – LauC Oct 06 '18 at 14:49