3

I have a randomForest model that I want to calculate on multiple cores. How can I tell the model to run in parallel?

This is not a duplicate of parallel execution of random forest in R as I don't need to run multiple models in parallel, I want one model to run in parallel.

steves
  • 331
  • 3
  • 16
  • @Florian no problem but please confirm for best of your knowledge if the combine will do this compared to run a full model. I mean if I will run foreach and use 5 "iterations" of 1000 trees and combine it's the same like doing one randomForest with ntree = 5000? – steves Apr 24 '18 at 08:17
  • 1
    To the best of my knowledge; they are equivalent. A random forest is just growing independent trees with some randomness, so it does not matter if those trees were grown in separate forests or not. It would be a different case if we were growing a boosted forest for example, where the trees are no longer grown independently, but sequentially. – Florian Apr 24 '18 at 08:24
  • 1
    Using `ranger` or `rborist` will help. Faster and out of the box parallelization. – phiver Apr 24 '18 at 08:30

1 Answers1

2

I use the doMC package and its registerDoMC function. Works really well.

nycrefugee
  • 1,629
  • 1
  • 10
  • 23