Binary vectors as y_score argument of roc_curve

Question

The sklearn roc_curve docstring states:

"y_score : array, shape = [n_samples] Target scores, can either be probability estimates of the positive class, confidence values, or binary decisions."

In what situation it would make sense to set y_score to a binary vector ("binary decisions")? Wouldn't that result in a ROC curve with one point on it which kind of defies the point?

Yes. You shouldn't do that. Maybe open a PR changing the docstring and saying that that is not very advisable. — Andreas Mueller, Feb 18 '14 at 19:02
Done: https://github.com/scikit-learn/scikit-learn/pull/2874 :) — Chris Gorgolewski, Feb 19 '14 at 12:05

score 4 · Accepted Answer · answered Feb 18 '14 at 11:04

4

If you are using a classifier that does not output probability scores (e.g. svm.SVC without an explicit probability=True), there isn't a way to compute a ROC curve. As an API designer, you have two choices: raise an exception and provide the user no useful information, or plot a degenerate curve with one data point. I would argue the latter is more useful.

answered Feb 18 '14 at 11:04

mbatchkarov

15,487
9
60
79

2

We actually had a student who generated those degenerate ROC curves, calculated AUC and thought everything is all right. I would lean towards raising an exception. – Chris Gorgolewski Feb 19 '14 at 12:05

Binary vectors as y_score argument of roc_curve

1 Answers1

Linked