1

What is the meaning of best_score_ of GridSearchCV, when using custom error function?

I'm running a simple experiment with Scikit GridSearchCV.

1) Train simple svm:

from sklearn.svm import LinearSVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

lin_svm_grid_params = dict(svm__C = [0.01])
lin_svm = Pipeline([("scaler", StandardScaler()), ("svm", LinearSVR(dual=False, loss='squared_epsilon_insensitive'))]) 
lin_svm_grid = GridSearchCV(lin_svm, lin_svm_grid_params, cv = 10, scoring='mean_squared_error', n_jobs = -1)
lin_svm_grid.fit(x, y)

2) Print results:

print lin_svm_grid.best_score_
print mean_squared_error(y, lin_svm_grid.best_estimator_.predict(x))
-610.141599985
236.578850489

So here is the main trouble: why the values are different? I guess GridSearchCV score is R^2 score, and can i make GridSearchCV return error function value instead of R^2?

Dmitry
  • 330
  • 1
  • 14
  • 1
    Have a look at [this](http://stackoverflow.com/questions/21443865/scikit-learn-cross-validation-negative-values-with-mean-squared-error) regarding how to interpret this value and why it's negative. The different values are easy to explain: grid-search uses some cross-validation, in your case 10-fold. The scores are calculated on these sets. Your alternative score uses some other set (x/y; which is not good to measure generalization because it seems you calculate the score on the training-data). – sascha Jul 13 '16 at 23:30
  • Thank you for your reply. Yes, now it's clear that score value is always maximized. I know that cross val score would be different from the estimated error on whole training data, I was a little bit confused by the different values, now I see that GridSearchCV does use the provided metric, and difference in values explained by cross validation issues. I will be happy to accept your answer as correct, thanks! – Dmitry Jul 14 '16 at 00:58

0 Answers0