12

I have a classification problem where I have the pixels values of an 8x8 image and the number the image represents and my task is to predict the number('Number' attribute) based on the pixel values using RandomForestClassifier. The values of the number values can be 0-9.

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

forest_model = RandomForestClassifier(n_estimators=100, random_state=42)
forest_model.fit(train_df[input_var], train_df[target])
test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1]
roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovr")

Here it throws an AxisError.

Traceback (most recent call last):
  File "dap_hazi_4.py", line 44, in 
    roc_auc_score(test_df['Number'], test_df['forest_pred'], average = 'macro', multi_class="ovo")
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 383, in roc_auc_score
    multi_class, average, sample_weight)
  File "/home/balint/.local/lib/python3.6/site-packages/sklearn/metrics/_ranking.py", line 440, in _multiclass_roc_auc_score
    if not np.allclose(1, y_score.sum(axis=1)):
  File "/home/balint/.local/lib/python3.6/site-packages/numpy/core/_methods.py", line 38, in _sum
    return umr_sum(a, axis, dtype, out, keepdims, initial, where)

AxisError: axis 1 is out of bounds for array of dimension 1
Bálint Béres
  • 168
  • 1
  • 1
  • 9
  • I managed to solve my problem. It was that, because my classification problem was multiclass the target column needed to be binarized before fitting and calculating the auc score. – Bálint Béres Apr 20 '20 at 03:44
  • What exactly did you do @Bálint Béres? – Manuel Nov 25 '20 at 03:49
  • I have used this [Calculate sklearn.roc_auc_score for multi-class](https://stackoverflow.com/a/52750599/12218616) @mclzc. – Bálint Béres Nov 25 '20 at 21:46
  • 5
    When using `sklearn.model_selection.cross_validate` and similar and this error appears you just need to set `needs_proba=True` in `make_scorer(roc_auc_score, multi_class='ovo', needs_proba=True)` – lhaferkamp Feb 02 '21 at 14:26

3 Answers3

13

The error is due to multi-class problem that you are solving as others suggested. All you need to do is instead of predicting the class, you need to predict the probabilities. I had this same problem before, doing this solves it.

Here is how to do it -

# you might be predicting the class this way
pred = clf.predict(X_valid)

# change it to predict the probabilities which solves the AxisError problem.
pred_prob = clf.predict_proba(X_valid)
roc_auc_score(y_valid, pred_prob, multi_class='ovr')
0.8164900342274142

# shape before
pred.shape
(256,)
pred[:5]
array([1, 2, 1, 1, 2])

# shape after
pred_prob.shape
(256, 3)
pred_prob[:5]
array([[0.  , 1.  , 0.  ],
       [0.02, 0.12, 0.86],
       [0.  , 0.97, 0.03],
       [0.  , 0.8 , 0.2 ],
       [0.  , 0.42, 0.58]])

bhola prasad
  • 675
  • 7
  • 22
3

Actually, as your problem is multi-class the labels must be one-hot encoded. When labels are one-hot encoded then the 'multi_class' arguments work. By providing one-hot encoded labels you can resolve the error.

Suppose, you have 100 test labels with 5 unique classes then your matrix size(test label's) must be (100,5) NOT (100,1)

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • I am having the same problem over here. How do I transform my `pred` from from `(45520,)` to `(45520,5)`? – arilwan Mar 15 '21 at 14:13
  • If you're using tensorflow or keras you can do it by using the function tf.keras.utils.to_categorical(.) or just keras.utils.to_categorical(.) – Lalith Bharadwaj Baru Apr 10 '21 at 18:57
  • If someone is using Sklearn, should use `LabelBinarizer` to convert the labels into one-hot-encode format. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html#sklearn.preprocessing.LabelBinarizer – Murilo Apr 03 '23 at 13:06
1

You sure this [:,1] in test_df['forest_pred'] = forest_model.predict_proba(test_df[input_var])[:,1] is right? It's probably 1D array

Minh-Long Luu
  • 2,393
  • 1
  • 17
  • 39