I'm using the parallel computing for the randomForest method in R. Unfortunately, RAM is my bottleneck here. It seems that R is duplicating the input data frame my_data. Can I stop the duplication and make my_data a shared object?
cl <- makeCluster(11)
registerDoParallel(cl)
rf_res <- foreach(ntree = rep(90, 11), .combine=randomForest::combine, .multicombine=TRUE, .packages = "randomForest") %dopar%
randomForest(F_BIN~., data=my_data, ntree=ntree, keep.forest=FALSE, importance=TRUE)
rf_im <- importance(rf_res)
stopCluster(cl)