sklearn: ValueError: multiclass format is not supported

Question

Answers to similar question exist, none worked to me, so I am posting this.

Using the mlxtend package to do a sequential forward feature selection. I am working on a multiclass (5 class) problem, and a random forest estimator.

from sklearn.ensemble import RandomForestClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS 

# initialise model
model = RandomForestClassifier(n_jobs=-1, verbose=0)

# initialise SFS object
sffs = SFS(model, k_features = "best",
           forward = True, floating = True, n_jobs=-1,
           verbose = 2, scoring= "roc_auc", cv=5 )

sffs.fit(X, y)

Error:

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
packages/sklearn/metrics/_scorer.py", line 106, in __call__
    score = scorer._score(cached_call, estimator, *args, **kwargs)
  File "~/venv/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 352, in _score
    raise ValueError("{0} format is not supported".format(y_type))
ValueError: multiclass format is not supported

Package versions:

>>> import sklearn, mlxtend

>>> print(sklearn.__version__)
1.0.2
>>> print(mlxtend.__version__)
0.22.0

My guess is that the issue may be due to `scoring= "roc_auc"` (which is [not recommended](https://stackoverflow.com/a/47111246/4685471), but this is a different discussion); could you possibly change it to something else (e.g. accuracy or precision) and see if the error still persists? — desertnaut, May 16 '23 at 23:18
I tested it with `scoring='accuracy'` and works fine. But I can't get it work with other scoring metric `{f1, precision, recall, roc_auc} `. — arilwan, May 17 '23 at 00:15
Sounds like anything requiring more defining parameters (`macro`, `micro`, `weighed` etc) will not work...? — desertnaut, May 17 '23 at 01:20
The error message seems pretty clear, and to follow up, read the User Guide: the table [here](https://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values) and the section on multilabel metrics [here](https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification). — Ben Reiniger, May 17 '23 at 03:39

score 1 · Accepted Answer · answered May 17 '23 at 08:34

1

The traditional ROC-AUC was designed as a classification metric for binary classification, it is not defined for multiclass classification (as the error states).

Instead, you can tranform your multiclass classification to binary with this strategy: Turn it into one-vs-rest. This makes it binary: Is it the correct class, or is it any other? To do so, you can use scoring= "roc_auc_ovr":

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS 

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

model = RandomForestClassifier(n_jobs=-1, verbose=0)

sffs = SFS(model, 
           k_features = "best",
           forward = True, 
           floating = True, 
           n_jobs=-1,
           verbose = 2, 
           scoring= "roc_auc_ovr", 
           cv=5 )

sffs.fit(X, y)

answered May 17 '23 at 08:34

DataJanitor

1,276
1
8
19

1

This was my first thought, too; but the `roc_auc_score` of scikit-learn (which seems to be used under the hood here) can be used for multclacs classification, according to the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html) (same for v1.0.2). – desertnaut May 17 '23 at 10:13
@desertnaut Yes, but per default it is set `multi_class='raise'` When I run OP's code I indeed get a `RuntimeWarning` instead of a `ValueError`). – DataJanitor May 17 '23 at 11:13
1

@arilwan You might switch to `sklearn`'s `feature_selection.SequentialFeatureSelector` instead of `mlxtend`'s. I don't know if it changes anything, but the community behind it might be larger. – DataJanitor May 17 '23 at 11:16

sklearn: ValueError: multiclass format is not supported

1 Answers1