I'm trying to setup an instance of GridSearchCV to determine which set of hyperparameters will produce the lowest mean absolute error. This scikit documentation indicates that score metrics can be passed into the grid upon creation of a GridSearchCV (below).
param_grid = {
'hidden_layer_sizes' : [(20,),(21,),(22,),(23,),(24,),(25,),(26,),(27,),(28,),(29,),(30,),(31,),(32,),(33,),(34,),(35,),(36,),(37,),(38,),(39,),(40,)],
'activation' : ['relu'],
'random_state' : [0]
}
gs = GridSearchCV(model, param_grid, scoring='neg_mean_absolute_error')
gs.fit(X_train, y_train)
print(gs.scorer_)
[1] make_scorer(mean_absolute_error, greater_is_better=False)
However the grid search is not selecting the best performing model in terms of mean absolute error
model = gs.best_estimator_.fit(X_train, y_train)
print(metrics.mean_squared_error(y_test, model.predict(X_test)))
print(gs.best_params_)
[2] 125.0
[3] Best parameters found by grid search are: {'hidden_layer_sizes': (28,), 'learning_rate': 'constant', 'learning_rate_init': 0.01, 'random_state': 0, 'solver': 'lbfgs'}
After running the above code and determining the so called 'best parameters', I delete one of the values found in gs.best_params_, and find that by running my program again the mean squared error will sometimes decrease.
param_grid = {
'hidden_layer_sizes' : [(20,),(21,),(22,),(23,),(24,),(25,),(26,),(31,),(32,),(33,),(34,),(35,),(36,),(37,),(38,),(39,),(40,)],
'activation' : ['relu'],
'random_state' : [0]
}
[4] 122.0
[5] Best parameters found by grid search are: {'hidden_layer_sizes': (23,), 'learning_rate': 'constant', 'learning_rate_init': 0.01, 'random_state': 0, 'solver': 'lbfgs'}
To clarify, I changed the set that was fed into my grid search so that it did not contain an option to select a hidden layer size of 28, when that change was made, I ran the code again and this time it picked a hidden layer size of 23 and the mean absolute error decreased (even though the size of 23 had been available from the start), why didn't it just pick this option from the start if it is evaluating the mean absolute error?