0

I have a pipeline with RandomForestRegressor as an estimator. (You don't have to check the pipeline, it works fine for any estimator).

The target contains 5 classes and I want to compute roc_auc_score. After reading the Sklearn user guide on the function, I learned that to compute ROC AUC for each class, you have to use the following syntax:

# Compute roc_auc for class 0
roc_auc_score(
    y_test,
    pipeline.predict_proba(X_test)[:, 0],
    multi_class="ovr",
    average="macro",
)

But this is giving me this error:

AxisError: axis 1 is out of bounds for array of dimension 1

I have found a similar question here but that thread didn't help. What worked was when I changed the above code to this:

# Remove the class subsetting
roc_auc_score(
    y_test,
    logreg_pipeline.predict_proba(X_test),
    multi_class="ovr",
    average="macro",
)

Instead of specifying the first class with [:, 0], I passed the whole numpy array you get from predict_proba and it returned a score of 0.92. Should I suppose that this score is the macro average for all 5 class's roc_auc?

P.S. the target is not encoded and since, there are 5 classes in the target, predict_proba is returning a numpy array with 5 columns

Bex T.
  • 1,062
  • 1
  • 12
  • 28

0 Answers0