9

I'm working on a multi classification problem with a neural network in scikit-learn and I'm trying to figure out how I can optimize my hyperparameters (amount of layers, perceptrons, other things eventually).

I found out that GridSearchCV is the way to do it but the code that I'm using returns me the average accuracy while I actually want to test on the F1-score. Does anyone have an idea about how I can edit this code to make it work for the F1-score?

In the beginning when I had to evaluate the precision/accuracy I thought it was 'enough' to just take the confusion matrix and make a conclusion out of it, while doing trial-and-error changing the amount of layers and perceptrons in my neural network again and again.

Today I figured out that there's more than that: GridSearchCV. I just need to figure out how i can evaluate the F1-score because I need to do a research on determining the accuracy from the neural network in terms of the layers, nodes, and eventually other alternatives...

mlp = MLPClassifier(max_iter=600)
clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3)
clf.fit(X_train, y_train.values.ravel())

parameter_space = {
    'hidden_layer_sizes': [(1), (2), (3)],
}

print('Best parameters found:\n', clf.best_params_)

means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

output:

Best parameters found:
 {'hidden_layer_sizes': 3}
0.842 (+/-0.089) for {'hidden_layer_sizes': 1}
0.882 (+/-0.031) for {'hidden_layer_sizes': 2}
0.922 (+/-0.059) for {'hidden_layer_sizes': 3}

So here my output gives me the mean accuracy (which I found is default on GridSearchCV). How can I change this to return the average F1-score instead of accuracy?

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Jonas
  • 103
  • 1
  • 1
  • 6

2 Answers2

13

You can create your own metric function with make_scorer. In this case, you can use sklearn's f1_score, but you can use your own if you prefer:

from sklearn.metrics import f1_score, make_scorer

f1 = make_scorer(f1_score , average='macro')


Once you have made your scorer, you can plug it directly inside the grid creation as scoring parameter:

clf = GridSearchCV(mlp, parameter_space, n_jobs= -1, cv = 3, scoring=f1)


On the other hand, I've used average='macro' as f1 multi-class parameter. This calculates the metrics for each label, and then finds their unweighted mean. But there are other options in order to compute f1 with multiple labels. You can find them here


Note: answer completely edited for better understanding

Haritz Laboa
  • 718
  • 2
  • 8
  • 18
  • Well it gave me this output: ```ValueError: Sample-based precision, recall, fscore is not meaningful outside multilabel classification. See the accuracy_score instead.``` – Jonas May 11 '19 at 13:38
  • Sorry, I didn't try it, I thought you could plug it directly in that way. I've updated my answer. It should work now. – Haritz Laboa May 11 '19 at 23:27
  • Your idea is good for recall macro, but how you define it to be maximum and not minimum? – Eli Borodach Jun 07 '22 at 11:43
0

According to: https://scikit-learn.org/stable/modules/model_evaluation.html

You can simply write: scoring='f1_macro'

Eli Borodach
  • 554
  • 3
  • 9
  • 22
  • It fails with `ValueError: 'f1_score' is not a valid scoring value. Use sorted(sklearn.metrics.SCORERS.keys()) to get valid options.` – user164863 Mar 03 '23 at 12:03