I'm constrained by the memory footprint / size of my Random Forest model so would prefer the number of trees be as low as possible and the depth of trees to be as shallow as possible while minimizing any impact on performance. Rather than needing to set-up hyperparameter tuning to optimize for this, I am wondering whether I can just build one large Random Forest composed of many deep trees. From this, can I then get an estimate of the performance of hypothetical smaller models enclosed within (and save myself the time of hyperparameter tuning -- again I'm looking to just tune on those parameters that generally just need to be "big enough" for the data/problem)?
For example, if I build a model with 1500 trees, could I just extract 500 of these and build a prediction from these to give an estimate of the performance of using just 500 trees (if I do this repeatedly, each time evaluating performance on a holdout set, I figure this should give an estimate of the performance of building a model with 500 trees -- unless I'm missing something?) I should be able to do this similarly with max tree depth or minimum node size, correct?
How would I do this in R on a ranger
model?
(Would appreciate any examples, with parsnip
would be a bonus. Also guidance / verification that this is a reasonable approach to use to avoid hyperparameter tuning for Random Forest models for those hyperparameters that simply need to be "big"/"deep" enough would also be helpful.)