I am using R with Rcpp to perform computationally expensive calculations which also require a lot of RAM. Since I often do these for different parameters, I want to calculate them in parallel. For this I use packages foreach
and doParallel
. My problem is that once a worker on a thread has finished, it seems like it does not release the RAM. For example, if I use 7 cores and want to scan 9 parameters, I get approximately this behavior:
You see that the jump in memory is roughly the same for the two threads 8 and 9 as it is for the seven threads 1-7. Only after 8 and 9 the memory seems to be released.
My minimal working example in R:
library(minpack.lm)
library(Rcpp)
library(myRcppPackage)
library(foreach)
library(doParallel)
myParameters <- c(1:9)
# setup parallel backend
cores=detectCores()
close( file( "./monitorfile.txt", open="w" ) ) # flush the monitor-file
cl <- makeCluster(cores[1]-1, outfile="./monitorfile.txt")
registerDoParallel(cl)
clusterExport(cl, list("performCppLoop"), envir = environment())
myResult <- foreach(i=1:length(myParameters), .combine=rbind) %dopar% {
# perform C++ loop with my parameters
myData <- performCppLoop(myParameters[i])
# do some stuff with myData
cbind(mean(myData[,1]), mean(myData[,2]), mean(myData[,3]))
rm(myData)
}
stopCluster(cl)
MWE C++ code:
// [[Rcpp::export]]
NumericMatrix performLoop(double myParameter){
const number_of_steps = 20000000;
// ps for phase space
NumericMatrix data(number_of_steps, 3);
for (unsigned long int i = 0; i <number_of_steps; i++) {
// just some dummy calculations
data(i, 0) = sqrt(myParameter);
data(i, 1) = myParameter*2.0;
data(i, 2) = myParameter/2.0;
}
return data;
}
What am I doing wrong?