0

I want to calculate and print roc_auc_score to evaluate the performance of my random forest model. I am doing NLP, hence the data in y_test and y_pred are list of words, I vectorize them with the function pipe_vect.transform, but when I print y_test and y_pred, they don't have the same dimension, here what I get:

print('y_pred dimension: ', y_pred.shape) #y_pred dimension:  (417, 1)
print('y_test dimension: ', y_test.shape) #y_test dimension:  (417,)

Therefore, I want to reshape y_test and give it two dimension.

Here my code :

x_test_vect = pipe_vect.transform(x_test)
y_pred = model.predict_proba(x_test_vect)
auc_score = roc_auc_score(y_test, y_pred)
print('Performance du modèle :', auc_score)

which yields the following error:

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    `y_test[:, np.newaxis]` should work – Andrew Mar 26 '20 at 10:40
  • 2
    you can reshape your `y_pred` with `y_pred.reshape(-1)` – Nullman Mar 26 '20 at 10:40
  • what different value are there in y_test????? – qaiser Mar 26 '20 at 10:44
  • 2
    The error is not due to the dimensions but because `y_true` is either all 1s or all 0s – FBruzzesi Mar 26 '20 at 11:05
  • @FBruzzesi do you know how can i fix this error ? – Isaac Duboc Mar 26 '20 at 13:24
  • Assuming that you actually have different labels in `y`, if you split the data using sklearn `train_test_split` provide the parameter `stratify=y`, if you split the data in some other manner it depends. – FBruzzesi Mar 26 '20 at 13:28
  • Pleas read closely the error message; as @FBruzzesi notices, it is due to one of your classes **not** being present in your test data `y_true`. You don't have a reshaping issue, and your description led to the question been erroneously closed as a duplicate. Check your `y_test`, as well as the exact way you are producing them (which you do not show). – desertnaut Mar 27 '20 at 00:23

2 Answers2

2

You can add dim using numpy.expand_dims

y_test = np.random.randn(417)
y_test.shape
(417,)

y_test = np.expand_dims(a, axis=1)
y_test.shape
(417, 1)
Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37
  • thanks ! but when i want to calculate and print the roc_auc_score function, i have always this error : ValueError: Only one class present in y_true. ROC AUC score is not defined in that case – Isaac Duboc Mar 26 '20 at 11:11
  • 1
    Check this it may help - https://stackoverflow.com/q/45139163/6075699 – Dishin H Goyani Mar 26 '20 at 11:13
1

To add a dimension to y_test:

y_test.shape # (417,)
y_test = y_test[...,np.newaxis]
y_test.shape # (417,1)

To remove a dimenstion from y_pred:

y_pred.shape # (417,1)
y_pred = y_pred.flatten()
y_pred.shape # (417,)
rbv
  • 363
  • 1
  • 3
  • 15
  • it works thanks ! but when i want to calculate and print the roc_auc_score function, i have always this error : ValueError: Only one class present in y_true. ROC AUC score is not defined in that case. – Isaac Duboc Mar 26 '20 at 11:01
  • 2
    As a best practice, if you only have a dimension 1 to remove, I think you should use `y_pred.squeeze()` as it is way faster. – FBruzzesi Mar 26 '20 at 11:05