GridSearchCV scoring on mean absolute error

Question

I'm trying to setup an instance of GridSearchCV to determine which set of hyperparameters will produce the lowest mean absolute error. This scikit documentation indicates that score metrics can be passed into the grid upon creation of a GridSearchCV (below).

param_grid = {
    'hidden_layer_sizes' : [(20,),(21,),(22,),(23,),(24,),(25,),(26,),(27,),(28,),(29,),(30,),(31,),(32,),(33,),(34,),(35,),(36,),(37,),(38,),(39,),(40,)],
    'activation' : ['relu'],
    'random_state' : [0]
    }
gs = GridSearchCV(model, param_grid, scoring='neg_mean_absolute_error')
gs.fit(X_train, y_train)
print(gs.scorer_)

[1] make_scorer(mean_absolute_error, greater_is_better=False)

However the grid search is not selecting the best performing model in terms of mean absolute error

model = gs.best_estimator_.fit(X_train, y_train)
print(metrics.mean_squared_error(y_test, model.predict(X_test)))
print(gs.best_params_)

[2] 125.0
[3] Best parameters found by grid search are: {'hidden_layer_sizes': (28,), 'learning_rate': 'constant', 'learning_rate_init': 0.01, 'random_state': 0, 'solver': 'lbfgs'}

After running the above code and determining the so called 'best parameters', I delete one of the values found in gs.best_params_, and find that by running my program again the mean squared error will sometimes decrease.

param_grid = {
'hidden_layer_sizes' : [(20,),(21,),(22,),(23,),(24,),(25,),(26,),(31,),(32,),(33,),(34,),(35,),(36,),(37,),(38,),(39,),(40,)],
'activation' : ['relu'],
'random_state' : [0]
}

[4] 122.0
[5] Best parameters found by grid search are: {'hidden_layer_sizes': (23,), 'learning_rate': 'constant', 'learning_rate_init': 0.01, 'random_state': 0, 'solver': 'lbfgs'}

To clarify, I changed the set that was fed into my grid search so that it did not contain an option to select a hidden layer size of 28, when that change was made, I ran the code again and this time it picked a hidden layer size of 23 and the mean absolute error decreased (even though the size of 23 had been available from the start), why didn't it just pick this option from the start if it is evaluating the mean absolute error?

Vivek Kumar · Accepted Answer · 2018-07-30T06:54:17.803

The grid-search and model fitting in essential, depends on random number generators for different purposes. In scikit-learn this is controlled by a param random_state. See my other answers to know about it:

Now in your case, I can think of these things where this random-number generation affects the training:

1) GridSearchCV will by default use a KFold with 3 folds for regression tasks, which may split data differently on different runs. It may happen that the splits that happened in two grid-search processes are different, and hence different scores.

2) You are using a separate test data for calculation the mse which the GridSearchCV dont have access to. So it will find the parameters appropriate for the supplied data which may or may not be perfectly valid for the separate dataset.

Update:

I see now that you have used random_state in param grid for model, so this point 3 now dont apply.

3) You have not shown which model are you using. But if the model during training is using sub-samples of data (like selecting smaller number of features, or smaller number of rows for iterations, or for different internal estimators), than you need to fix that too to get the same scores. You need to check the results by first fixing that.

Recommendation Example

You can take ideas from this example:

# Define a custom kfold
from sklearn.model_selection import KFold
kf = KFold(n_splits=3, random_state=0)

# Check if the model you chose support random_state
model = WhateEverYouChoseClassifier(..., random_state=0, ...)

# Pass these to grid-search
gs = GridSearchCV(model, param_grid, scoring='neg_mean_absolute_error', cv = kf)

And then again do the two experiments you did by changing the param grid.

I'm using MLPRegressor from sklearn.neural_network. I added the kf variable from your solution. My results did change, but the mean absolute error is still fluctuating up and down when I delete grid parameters. I also realized that GridSearchCV is performing on my training data, not my test data, so I also changed my output to reflect that. — David Bean, Jul 26 '18 at 14:12

GridSearchCV scoring on mean absolute error

1 Answers1

Update:

Recommendation Example