0

I am doing logistic regression for titanic dataset. I could not understand [:, 1] in code to create roc curve.

from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
logit_roc_auc = roc_auc_score(y_test, logmodel.predict(X_test))
fpr, tpr, thresholds = roc_curve(y_test, logmodel.predict_proba(X_test)[:,1])
plt.figure(figsize=(10,5))
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Survived or Not')
plt.legend(loc="lower right")
plt.savefig('Log_ROC')
plt.show()

1 Answers1

1

Based on sklearn documentation, predict_proba returns probability estimates for all classes and the results are ordered by the label of classes.

Logistic regression is a binary classifier. Hence, logmodel.predict_proba(X_test) returns 2 columns for the negative and positive class. An example from StackOverflow Question

[[  4.65761066e-03   9.95342389e-01]
 [  9.75851270e-01   2.41487300e-02]
 [  9.99983374e-01   1.66258341e-05]]

Running logmodel.predict_proba(X_test)[:,1] get the predicted probabilities of the positive label only, which yield [9.95342389e-01, 2.41487300e-02, 1.66258341e-05] in this example.

wavingtide
  • 1,032
  • 4
  • 19