sklearn GridSearchCV with Pipeline

Question

I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn't possible for mean squared error - what am I doing wrong here?

score 48 · Accepted Answer · answered Jan 11 '14 at 11:07

48

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

answered Jan 11 '14 at 11:07

Fred Foo

355,277
75
744
836

1

Can you point out any documents about this or it based on your test? – Chau Pham Oct 10 '18 at 04:03
https://github.com/scikit-learn/scikit-learn/issues/2439 (I personally think it should be negative and not "negated") – Heberto Mayorquin Aug 23 '19 at 14:00
1

I'm a bit confused now. Do I have to use 'neg_mean_squared_error' in model.compile() for "loss" and metric" or 'mean_squared_error'? – Ben Oct 30 '19 at 13:17

score 7 · Answer 2 · edited Sep 01 '23 at 19:07

An alternate way to create GridSearchCV is to use make_scorer and turn greater_is_better flag to False

So, if rgn is your regression model, and parameters are your hyperparameter lists, you can use the make_scorer like this:

from sklearn.metrics import make_scorer
#define your own mse and set greater_is_better=False
mse = make_scorer(mean_squared_error,greater_is_better=False)

Now, same as below, you can call the GridSearch and pass your defined mse

grid_obj = GridSearchCV(rgn, parameters, cv=5,scoring=mse,n_jobs = -1, verbose=True)

score 0 · Answer 3 · edited Apr 26 '20 at 22:02

0

If you want to get RMSE as a metric you can write your own callable/function which will take Y_pred and Y_org and calculate the RMSE.

ref

edited Apr 26 '20 at 22:02

camille

16,432
18
38
60

answered Jul 11 '18 at 05:11

mlengg

113
8

score 0 · Answer 4 · answered Feb 27 '20 at 19:48

Suppose, I have stored results of negative MSE and negative MAE obtained from GridSearchCV in lists named as model_nmse and model_nmae respectively .

So i would simply multiply it with (-1) , to get desired MSE and MAE scores.

model_mse = list(np.multiply(model_nmse , -1))

model_mae = list(np.multiply(model_nmae , -1))

score 0 · Answer 5 · edited Jun 12 '20 at 08:55

0

You can see the scoring in the documentation

enter image description here

edited Jun 12 '20 at 08:55

Jeremy Caney

7,102
69
48
77

answered Jun 12 '20 at 08:49

chaoyu feng

11

1

The question asks for why the RMSE values turn out negative; this doesn't seem like the answer to the question. – gust Jun 12 '20 at 09:15
@Gust there is a 'neg_root_mean_squared_error', I thought it will be easy to get the RMSE right? – chaoyu feng Jun 13 '20 at 09:09
@JeremyCaney Thanks for your advice, here is the link to scikit learn document of scoring https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter – chaoyu feng Jun 13 '20 at 09:11

sklearn GridSearchCV with Pipeline

5 Answers5

Linked

Related