0

I am interested in running a Random Forest model on a very large dataset. I have been reading about "parallel computing" in an effort to make the code run faster. I came across this post over here (parallel execution of random forest in R) that had some suggestions:

library(randomForest)
library(doMC)

registerDoMC()
x <- matrix(runif(500), 100)
y <- gl(2, 50)

rf <- foreach(ntree=rep(25000, 6), .combine=randomForest::combine,
              .multicombine=TRUE, .packages='randomForest') %dopar% {
    randomForest(x, y, ntree=ntree)
}

I am trying to understand what is happening in the above code - my guess is that perhaps 6 Random Forest models (with each Random Forest Model having 25000 trees) are being fit to dataset and then combined into a single model?

I started looking into the "combine()" function in R (https://cran.r-project.org/web/packages/randomForest/randomForest.pdf) - it seems that the "combine()" function is combining several Random Forest models into a single model (here, I think 3 Random Forest models are being combined into a single model):

data(iris)
rf1 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf2 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf3 <- randomForest(Species ~ ., iris, ntree=50, norm.votes=FALSE)
rf.all <- combine(rf1, rf2, rf3)
print(rf.all)

My Question: Can someone please confirm if I have understood this correctly? In the above code, are 6 Random Forest models being trained in parallel and then combined into a single model - is this correct?

References:

stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 1
    You might easily check this yourself by taking a look at `body(randomForest::combine)`. – jay.sf Jun 18 '22 at 06:57
  • @ jay.sf : thank you for your reply! I have never heard of the "body()" function until now. A lot of code came out when I used "body(randomForest::combine)" - based on this code that came out, it was a bit difficult for me to understand what was happening! – stats_noob Jun 19 '22 at 01:54

1 Answers1

1

Yes, I would say yes. foreach's .combine=arguments takes the function given for it to apply on the results the combination.

Gwang-Jin Kim
  • 9,303
  • 17
  • 30