I wanted to plot ROC curve for my classification model. As I am new to this, I read about it, saw few posts and referring this SO answer I created a roc curve.
My data is this type:
print(Y.shape)
print(predictions.shape)
print(Y)
print(predictions)
(1, 400)
(1, 400)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1]]
[[0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1
0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1
1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0
0 0 1 0]]
After implementing the code:
from sklearn.metrics import precision_score
print('Precsion score: '+ str(precision_score(Y.ravel(), predictions.ravel())))
from sklearn.metrics import recall_score
print('Recall score: '+ str(recall_score(Y.ravel(), predictions.ravel())))
from sklearn.metrics import f1_score
print('F1 score: '+ str(f1_score(Y.ravel(), predictions.ravel())))
from sklearn.metrics import roc_auc_score, auc, roc_curve
print('ROC score: ' + str(roc_auc_score(Y.ravel(), predictions.ravel())))
from sklearn.metrics import confusion_matrix
print('Confusion matrix: ')
print(confusion_matrix(Y.ravel(), predictions.ravel()))
fpr = dict()
tpr = dict()
threshold = dict()
roc_auc = dict()
for i in range(2):
fpr[i], tpr[i], threshold[i] = roc_curve(Y.ravel(), predictions.ravel())
roc_auc[i] = auc(fpr[i], tpr[i])
print(fpr, tpr, threshold, roc_auc)
plt.figure()
plt.plot(fpr[1], tpr[1])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic(ROC Curve)')
Output:
Precsion score: 0.9179487179487179
Recall score: 0.895
F1 score: 0.9063291139240507
ROC score: 0.9075
Confusion matrix:
[[184 16]
[ 21 179]]
{0: array([0. , 0.08, 1. ]), 1: array([0. , 0.08, 1. ])} {0: array([0. , 0.895, 1. ]), 1: array([0. , 0.895, 1. ])} {0: array([2, 1, 0]), 1: array([2, 1, 0])} {0: 0.9075, 1: 0.9075}
Text(0.5, 1.0, 'Receiver operating characteristic(ROC Curve)')
I didn't understand why loop was used? I can see three values calculate in each row for FPR, TPR, Threshold and roc_auc. I did read that roc_curve takes probabilities as target scores(I will work on that). However I am not able to follow from my inputting (1,400)dimension data to how these array got calculated?
Thanks in advance.