The code for hyperparameter tuning using scikit-learn looks like this:
gs = GridSearchCV(estimator=pipe_svc,
param_grid=param_grid,
scoring='accuracy',
cv=10,
n_jobs=-1)
gs = gs.fit(X_train, y_train)
clf = gs.best_estimator_
clf.fit(X_train, y_train)
where for each combination of hyperparameters K-fold cross-validation is performed and the the combination that gives the best score is used to train over the entire training data to fit the model and this model will be used to predict on the test (unseen) data.
My question is how can I do the same job using nested cross-validation. The below code performs nested 5x2 cross-validation
gs = GridSearchCV(estimator=pipe_svc,
param_grid=param_grid,
scoring='accuracy',
cv=2)
scores = cross_val_score(gs, X_train, y_train,
scoring='accuracy', cv=5)
where GridSearchCV runs the inner loop while cross_val_score() runs the outer loop. Since cv=5 given to cross_val_score(), the result will be five different models (i.e., hyperparameters)
If the model is stable enough, then all the resulting hyperparameters may be same. But if not, one should naturally choose the hyperparameters that correspond to the highest one in the scores array returned by cross_val_score()
I would like to know how to access it so that I can use it to once again fit the model using the entire training data and finally predict on the test dataset.