I am using the caret
package in R
for some supervised multivariate analysis. I am trying to add some functionality to my script that will allow for reproducible outcomes whenever the script is run.
I have this setup for using 2 classification models (each model is run separately, not as an enesmble):
library(caret)
load.data = ....
cleaned.data = cleaning(load.data)
mycontrol = trainControl(...)
train, test = createDatapartition(...)
model1 = train(...,
data=train, ...,
trControl=mycontrol,
preprocess=c('center'))
model2 = train(...,
data=train, ...,
trControl=mycontrol,
preprocess=c('pca'))
feature.importances = ...
summary(resamples(list(m1=model1,m2=model2)))
learing_curve_dat(...) #see link 1. below.
predict()
Evaluate(....) #see link 2. below
Where in this pipeline should I use set.seed(#)
and what should #
be in order to get reproducible outcomes each time the script is run - or do I just pick any value for #
randomly?
Links: