0

I have a question in the MLR package,

after tuning a randomforest hyperparameters with a cross validation

getLearnerModel(rforest) - will not use CV, rather use the entire data set as a whole, is that correct?

#traintask
trainTask <- makeClassifTask(data = trainsample,target = "DIED30", positive="1")

#random forest tuning
rf <- makeLearner("classif.randomForest", predict.type = "prob", par.vals = list(ntree = 1000, mtry = 3))
rf$par.vals <- list(  importance = TRUE)
rf_param <- makeParamSet(
  makeDiscreteParam("ntree",values= c(500,750, 1000,2000)),
  makeIntegerParam("mtry", lower = 1, upper = 15),
  makeDiscreteParam("nodesize", values =c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20))

)
rancontrol <- makeTuneControlGrid()
set_cv <- makeResampleDesc("CV",iters = 10L)
rf_tune <- tuneParams(learner = rf, resampling = set_cv, task = trainTask, par.set = rf_param, control = rancontrol, measures = auc)
rf_tune$x
rf.tree <- setHyperPars(rf, par.vals = rf_tune$x)

#train best model
rforest <- train(rf.tree, trainTask)
getLearnerModel(rforest)
#predict
pforest<- predict(rforest,trainTask)

rforest is eventually trained using the RF model on the entire data, rather than cross validation. is there any way to perform the final training with CV as well in MLR?

I'm planning to validate the result on an external dataset. Should I train the model with 10CV prior to running on the external dataset (don't know how) or just use parameters found in the 10CV hyperparameters search?

thanks in advance for your time,

XPeriment
  • 21
  • 3
  • I don't understand the question. Please edit it and add a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). `getLearnerModel()` only returns the fitted model of a wrapped model, in this case of the Tuning Wrapper. – pat-s Dec 07 '19 at 12:30
  • thank you, i edited and added the code – XPeriment Dec 07 '19 at 14:51
  • Not sure what your question is. You get a different model for every fold of the CV -- you could put those into an ensemble (although I'm not sure why you would), but there's no single final model in this process. The point of doing a CV is to get a better generalization estimate for the model trained on the entire data. – Lars Kotthoff Dec 07 '19 at 15:56

0 Answers0