I'm working with a matrix of large size(1.5GB and growing). One particular function is taking a lot of time and it seems like a good candidate for parallelization using foreach package. I'm running this with registerDoParallel(cores=4) on Ubuntu with 4cores and 8GB RAM. When I use foreach, I understand that a copy of big matrix will be made for all 4 processes. Quickly I see that memory usage reaches 100%. I read another post where the suggestion was to use bigmemory and use attach.big.matrix() so process can share the same matrix. I definitely have enough RAM to hold a copy in memory. But when I do this, I notice that the time taken to execute actually increased.
user system elapsed
9889.944 185.590 2670.001 - DoParallel with 4 cores!
8931.887 92.214 4526.306 - DoParallel with 2 cores!
9320.523 150.122 9473.165 - DoParallel with 1 core!
1314.037 6.236 1320.290 - Serial execution without foreach and without big.matrix.
I was not able to come up with an explanation for this. Below is my code. I've tried a few other things like sending a block to each process(foreach by default seems to do that too). Nothing could get execution faster than serial. I do see all 4 cores used at 100% when I set cores to 4. There does seem to be an improvement over 1 core using bix.matrix but no improvement over serial execution which never used big matrix.
calcQIDiffForRow <- function(row, Desc){
mat <- attach.big.matrix(Desc)
x <- mat[row, ]
for(j in 1:(row - 1)) {
y <- mat[j, ]
...
}
return(val)
}
calcQIDiff <- function(mat){
registerDoParallel(cores=4)
desc <- describe(mat)
ret <- foreach(i = 1:nrow(mat), .combine=rbind, .multicombine=T,
.noexport=c("mat")) %dopar% calcQIDiffForRow(i, Desc)
return(ret)
}
system.time(QIdiff.parallel <- calcQIDiff(as.big.matrix(bigmatrix))).