I'm having a hard time figuring out parameter return_train_score
in GridSearchCV
. From the docs:
return_train_score
: boolean, optionalIf
False
, thecv_results_
attribute will not include training scores.
My question is: what are the training scores?
In the following code I'm splitting data into ten stratified folds. As a consequence grid.cv_results_
contains ten test scores, namely 'split0_test_score'
, 'split1_test_score'
, ..., 'split9_test_score'
. I'm aware that each of those is the success rate obtained by a 5-nearest neighbors classifier that uses the corresponding fold for testing and the remaining nine folds for training.
grid.cv_results_
also contains ten train scores: 'split0_train_score'
, 'split1_train_score'
, ..., 'split9_train_score'
. How are these values calculated?
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import StratifiedKFold
X, y = datasets.load_iris(True)
skf = StratifiedKFold(n_splits=10, random_state=0)
knn = KNeighborsClassifier()
grid = GridSearchCV(estimator=knn,
cv=skf,
param_grid={'n_neighbors': [5]},
return_train_score=True)
grid.fit(X, y)
print('Mean test score: {}'.format(grid.cv_results_['mean_test_score']))
print('Mean train score: {}'.format(grid.cv_results_['mean_train_score']))
#Mean test score: [ 0.96666667]
#Mean train score: [ 0.96888889]