0

I'm using the parallel computing for the randomForest method in R. Unfortunately, RAM is my bottleneck here. It seems that R is duplicating the input data frame my_data. Can I stop the duplication and make my_data a shared object?

cl <- makeCluster(11)
registerDoParallel(cl)

rf_res <- foreach(ntree = rep(90, 11), .combine=randomForest::combine, .multicombine=TRUE, .packages = "randomForest") %dopar% 
    randomForest(F_BIN~., data=my_data, ntree=ntree, keep.forest=FALSE, importance=TRUE)

rf_im <- importance(rf_res)

stopCluster(cl)
stolikp
  • 31
  • 7
  • If your are on a Linux/Mac, the second answer from here might help: https://stackoverflow.com/questions/31575585/shared-memory-in-parallel-foreach-in-r – JonasV Aug 04 '20 at 08:57
  • I'm on Linux. Notice that my_data is a data frame not a big matrix. Moreover, it's a different case: "the problem is that in your code your big matrix c is referenced in the assignment c<-m[1,1]. Just try xyz <- m[1,1] instead and see what happens" - I can't use that hint in my case. – stolikp Aug 04 '20 at 09:44

0 Answers0