I have a pipeline with RandomForestRegressor as an estimator. (You don't have to check the pipeline, it works fine for any estimator).
The target contains 5 classes and I want to compute roc_auc_score
. After reading the Sklearn user guide on the function, I learned that to compute ROC AUC for each class, you have to use the following syntax:
# Compute roc_auc for class 0
roc_auc_score(
y_test,
pipeline.predict_proba(X_test)[:, 0],
multi_class="ovr",
average="macro",
)
But this is giving me this error:
AxisError: axis 1 is out of bounds for array of dimension 1
I have found a similar question here but that thread didn't help. What worked was when I changed the above code to this:
# Remove the class subsetting
roc_auc_score(
y_test,
logreg_pipeline.predict_proba(X_test),
multi_class="ovr",
average="macro",
)
Instead of specifying the first class with [:, 0]
, I passed the whole numpy array you get from predict_proba
and it returned a score of 0.92. Should I suppose that this score is the macro average for all 5 class's roc_auc?
P.S. the target is not encoded and since, there are 5 classes in the target, predict_proba is returning a numpy array with 5 columns