33

I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn't possible for mean squared error - what am I doing wrong here?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
mchangun
  • 9,814
  • 18
  • 71
  • 101

5 Answers5

48

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 1
    Can you point out any documents about this or it based on your test? – Chau Pham Oct 10 '18 at 04:03
  • https://github.com/scikit-learn/scikit-learn/issues/2439 (I personally think it should be negative and not "negated") – Heberto Mayorquin Aug 23 '19 at 14:00
  • 1
    I'm a bit confused now. Do I have to use 'neg_mean_squared_error' in model.compile() for "loss" and metric" or 'mean_squared_error'? – Ben Oct 30 '19 at 13:17
7

An alternate way to create GridSearchCV is to use make_scorer and turn greater_is_better flag to False

So, if rgn is your regression model, and parameters are your hyperparameter lists, you can use the make_scorer like this:

from sklearn.metrics import make_scorer
#define your own mse and set greater_is_better=False
mse = make_scorer(mean_squared_error,greater_is_better=False)

Now, same as below, you can call the GridSearch and pass your defined mse

grid_obj = GridSearchCV(rgn, parameters, cv=5,scoring=mse,n_jobs = -1, verbose=True)
Michael Szczepaniak
  • 1,970
  • 26
  • 35
Espanta
  • 1,080
  • 1
  • 17
  • 27
0

If you want to get RMSE as a metric you can write your own callable/function which will take Y_pred and Y_org and calculate the RMSE.

ref

camille
  • 16,432
  • 18
  • 38
  • 60
mlengg
  • 113
  • 8
0

Suppose, I have stored results of negative MSE and negative MAE obtained from GridSearchCV in lists named as model_nmse and model_nmae respectively .

So i would simply multiply it with (-1) , to get desired MSE and MAE scores.

model_mse = list(np.multiply(model_nmse , -1))

model_mae = list(np.multiply(model_nmae , -1))
0

You can see the scoring in the documentation

enter image description here

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
  • 1
    The question asks for why the RMSE values turn out negative; this doesn't seem like the answer to the question. – gust Jun 12 '20 at 09:15
  • @Gust there is a 'neg_root_mean_squared_error', I thought it will be easy to get the RMSE right? – chaoyu feng Jun 13 '20 at 09:09
  • @JeremyCaney Thanks for your advice, here is the link to scikit learn document of scoring https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter – chaoyu feng Jun 13 '20 at 09:11