Multilabel classification in scikit-learn with hyperparameter search: specifying averaging

Question

I am working on a simple multioutput classification problem and noticed this error showing up whenever running the below code:

ValueError: Target is multilabel-indicator but average='binary'. Please 
choose another average setting, one of [None, 'micro', 'macro', 'weighted', 'samples'].

I understand the problem it is referencing, i.e., when evaluating multilabel models one needs to explicitly set the type of averaging. Nevertheless, I am unable to figure out where this average argument should go to, since only accuracy_score, precision_score, recall_score built-in methods have this argument which I do not use explicitly in my code. Moreover, since I am doing a RandomizedSearch, I cannot just pass a precision_score(average='micro') to the scoring or refit arguments either, since precision_score() requires correct and true y labels to be passed. This is why this former SO question and this one here, both with a similar issue, didn't help.

My code with example data generation is as follows:

from sklearn.datasets import make_multilabel_classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

X, Y = make_multilabel_classification(
    n_samples=1000,
    n_features=2,
    n_classes=5,
    n_labels=2
)

pipe = Pipeline(
    steps = [
        ('scaler', MinMaxScaler()),
        ('model', MultiOutputClassifier(MultinomialNB()))
    ]
)

search = RandomizedSearchCV(
    estimator = pipe,
    param_distributions={'model__estimator__alpha': (0.01,1)},
    scoring = ['accuracy', 'precision', 'recall'],
    refit = 'precision',
    cv = 5
).fit(X, Y)

What am I missing?

score 2 · Answer 1 · answered Feb 05 '22 at 15:18

From the scikit-learn docs, I see that you can pass a callable that returns a dictionary where the keys are the metric names and the values are the metric scores. This means you can write your own scoring function, which has to take the estimator, X_test, and y_test as inputs. This in turn must compute y_pred and use that to compute the scores you want to use. This you can do doing the built-in methods. There, you can specify which keyword arguments should be used to compute the scores. In code that would look like

def my_scorer(estimator, X_test, y_test) -> dict[str, float]:
    y_pred = estimator.predict(X_test)
    return {
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred, average='micro'),
        'recall': recall_score(y_test, y_pred, average='micro'),
    }

search = RandomizedSearchCV(
    estimator = pipe,
    param_distributions={'model__estimator__alpha': (0.01,1)},
    scoring = my_scorer,
    refit = 'precision',
    cv = 5
).fit(X, Y)

Thank you! This is the best example I could find on how to build my own scoring function. — Aivoric, Aug 29 '22 at 17:41

score 0 · Answer 2 · answered Feb 06 '22 at 03:33

From the table of scoring metrics, note f1_micro, f1_macro, etc., and the notes "suffixes apply as with ‘f1’" given for precision and recall. So e.g.

search = RandomizedSearchCV(
    ...
    scoring = ['accuracy', 'precision_micro', 'recall_macro'],
    ...
)

Multilabel classification in scikit-learn with hyperparameter search: specifying averaging

2 Answers2