I am working on a simple multioutput classification problem and noticed this error showing up whenever running the below code:
ValueError: Target is multilabel-indicator but average='binary'. Please
choose another average setting, one of [None, 'micro', 'macro', 'weighted', 'samples'].
I understand the problem it is referencing, i.e., when evaluating multilabel models one needs to explicitly set the type of averaging. Nevertheless, I am unable to figure out where this average
argument should go to, since only accuracy_score
, precision_score
, recall_score
built-in methods have this argument which I do not use explicitly in my code. Moreover, since I am doing a RandomizedSearch
, I cannot just pass a precision_score(average='micro')
to the scoring
or refit
arguments either, since precision_score()
requires correct and true y
labels to be passed. This is why this former SO question and this one here, both with a similar issue, didn't help.
My code with example data generation is as follows:
from sklearn.datasets import make_multilabel_classification
from sklearn.naive_bayes import MultinomialNB
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
X, Y = make_multilabel_classification(
n_samples=1000,
n_features=2,
n_classes=5,
n_labels=2
)
pipe = Pipeline(
steps = [
('scaler', MinMaxScaler()),
('model', MultiOutputClassifier(MultinomialNB()))
]
)
search = RandomizedSearchCV(
estimator = pipe,
param_distributions={'model__estimator__alpha': (0.01,1)},
scoring = ['accuracy', 'precision', 'recall'],
refit = 'precision',
cv = 5
).fit(X, Y)
What am I missing?