9

I am tying to plot an ROC curve for Binary classification using RandomForestClassifier

I have two numpy arrays one contains predicted values and one contains true values as follows:

In [84]: test
Out[84]: array([0, 1, 0, ..., 0, 1, 0])

In [85]: pred
Out[85]: array([0, 1, 0, ..., 1, 0, 0])

How do I port ROC curve and obtain AUC (Area Under Curve) for this binary classification result in ipython?

Ani
  • 2,848
  • 2
  • 24
  • 34

1 Answers1

14

You need probabilities to create ROC curve.

In [84]: test
Out[84]: array([0, 1, 0, ..., 0, 1, 0])

In [85]: pred
Out[85]: array([0.1, 1, 0.3, ..., 0.6, 0.85, 0.2])

Example code from scikit-learn examples:

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(2):
    fpr[i], tpr[i], _ = roc_curve(test, pred)
    roc_auc[i] = auc(fpr[i], tpr[i])

print roc_auc_score(test, pred)
plt.figure()
plt.plot(fpr[1], tpr[1])
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.show()
Ruthger Righart
  • 4,799
  • 2
  • 28
  • 33
Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
  • 2
    check if the length of shape[0] of `test` or `pred` is not equal to 0. if it is use `anyarray.reshape(-1)` . you can obtain probabilities using `model.predict_proba(testdata)[:, 1]` – Abhishek Thakur Mar 27 '17 at 10:11
  • I got a keyerror at `plt.plot(fpr[2], tpr[2])` I changed it to `1` ... Everything else worked !!! – Ani Mar 27 '17 at 10:20
  • `fpr[2]` in the example is because there were 3 classes. For binary classification, just compute `fpr, tpr, _ = roc_curve(y_test, y_score)` and plot `x=fpr, y=tpr`. – william_grisaitis Aug 30 '18 at 17:57