Interpreting sklearns' GridSearchCV best score

Question

I would like to know the difference between the score returned by GridSearchCV and the R2 metric calculated as below. In other cases I receive the grid search score highly negative (same applies for cross_val_score) and I would be grateful for explaining what it is.

from sklearn import datasets
from sklearn.model_selection import (cross_val_score, GridSearchCV)
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score, r2_score
from sklearn import tree

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
X = pd.DataFrame(X)

parameters = {'splitter':('best','random'), 
              'max_depth':np.arange(1,10), 
              'min_samples_split':np.arange(2,10), 
              'min_samples_leaf':np.arange(1,5)}

regressor = GridSearchCV(DecisionTreeRegressor(), parameters, scoring = 'r2', cv = 5)
regressor.fit(X, y)

print('Best score: ', regressor.best_score_)
best = regressor.best_estimator_
print('R2: ', r2_score(y_pred = best.predict(X), y_true = y))

This might be relevant: https://stats.stackexchange.com/questions/12900/when-is-r-squared-negative — Davide Fiocco, May 08 '18 at 11:49

score 9 · Answer 1 · edited Sep 03 '19 at 17:17

9

The regressor.best_score_ is the average of r2 scores on left-out test folds for the best parameter combination.

In your example, the cv=5, so the data will be split into train and test folds 5 times. The model will be fitted on train and scored on test. These 5 test scores are averaged to get the score. Please see documentation:

"best_score_: Mean cross-validated score of the best_estimator"

The above process repeats for all parameter combinations. And the best average score from it is assigned to the best_score_.

You can look at my other answer for complete working of GridSearchCV

After finding the best parameters, the model is trained on full data.

r2_score(y_pred = best.predict(X), y_true = y)

is on the same data as the model is trained on, so in most cases, it will be higher.

edited Sep 03 '19 at 17:17

Sarah

1,854
17
18

answered May 08 '18 at 12:39

Vivek Kumar

35,217
8
109
132

I'm pretty sure that this is incorrect. Compare: regressor.best_score_ to regressor.cv_results_. I just did this and it is clear that best_score_ is set equal to the LARGEST mean_test_score value; NOT the average of them, which is unfortunate. I am using sklearn 0.20.3 – NLR May 20 '19 at 20:56
@NLR And what does `mean_test_score` sounds like? Emphasis on `mean`. I never said that it is an average of all `mean_test_score` values. I said its an average of values from all test folds. – Vivek Kumar May 21 '19 at 05:45
regressor.best_score_ outputs the SINGLE best value from the folds. It returns the score for one fold. I found this out by looking at the output of the cross validation using regressor.cv_results_. If you take the mean of the test scores (mean_test_score), it is different from the best_score_. Try it yourself and you'll see what I mean. (There is a small typo in my first comment: it should read "...set equal to the LARGEST test_score"; not "mean_test_score". I manually calculated the mean.) – NLR May 21 '19 at 14:38

score 3 · Accepted Answer · edited May 09 '18 at 06:58

The question linked by @Davide in the comments has answers why you get a positive R2 score - your model performs better than a constant prediction. At the same time you can get negative values in other situation, if your models there perform bad.

the reason for the difference in values is that regressor.best_score_ is evaluated on a particular fold out of the 5-fold split that you do, whereas r2_score(y_pred = best.predict(X), y_true = y) evaluates the same model (regressor.best_estimator_) but on the full sample (including the (5-1)-fold sub-set that was used to train that estimator)

Interpreting sklearns' GridSearchCV best score

2 Answers2