0

Can anyone clarify how is the best procedure to set.seed() before running a machine learning algorithms? I have built a random forest model, a gbm model and a bart model. Does every of them require a seed for reproducible results? I have not split my dataset into train and test. I have seen a lot of examples for random forest but I am not sure if this is required for BART and GBM as well. An example of my models:

set.seed(500)
mod_BART <- bart(x.train = dataset[ , preds_selected], y.train = dataset[ , 1], keeptrees = TRUE)
summary(mod_BART)

set.seed(500)
formula_GBM <- as.formula(paste("presence ~", paste(preds_selected, collapse = "+")))
mod_GBM <- gbm(formula_GBM, data = dataset, distribution="bernoulli") 

Also how many times should I set the seed? if the models are in the same script is it enough to set only 1 seed before the first model? Thanks a lot Angela

Anjeline
  • 21
  • 4

1 Answers1

0

For consistent results, you must set the seed before each process with some element of randomness. If you set seed, then run your random forest, the seed is "consumed" for lack of a better word, and it won't be available for the next process.

It would be best for reproducibility to set the seed before each of these models.

You could also add something like the line below to the top of your script. This would essentially set the seed after any function is called for the remainder of the script.

addTaskCallback(function(...) {set.seed(500);TRUE})
Rudy S
  • 1
  • 3