Plotting a ROC curve in scikit yields only 3 points

Question

TLDR: scikit's roc_curve function is only returning 3 points for a certain dataset. Why could this be, and how do we control how many points to get back?

I'm trying to draw a ROC curve, but consistently get a "ROC triangle".

lr = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')
y = data['target'].values
X = data[['feature']].values

model = lr.fit(X,y)

# get probabilities for clf
probas_ = model.predict_log_proba(X)

Just to make sure the lengths are ok:

print len(y)
print len(probas_[:, 1])

Returns 13759 on both.

Then running:

false_pos_rate, true_pos_rate, thresholds = roc_curve(y, probas_[:, 1])
print false_pos_rate

returns [ 0. 0.28240129 1. ]

If I call threasholds, I get array([ 0.4822225 , -0.5177775 , -0.84595197]) (always only 3 points).

It is therefore no surprise that my ROC curve looks like a triangle.

What I cannot understand is why scikit's roc_curve is only returning 3 points. Help hugely appreciated.

enter image description here

Did you check the values in `probas_[:,1]`? Although it has length of 13759, it may only contain 3 values... — pyan, May 05 '15 at 14:58
Thank you for your help, I did `[print pd.Series(probas_[:,1]).unique()]`, and indeed only 2 uniques (`[-0.84595197 -0.5177775 ]`) were returned — sapo_cosmico, May 05 '15 at 15:09

score 20 · Accepted Answer · answered May 05 '15 at 15:40

20

The number of points depend on the number of unique values in the input. Since the input vector has only 2 unique values, the function gives correct output.

answered May 05 '15 at 15:40

pyan

3,577
4
23
36

Tomas G. · Answer 2 · 2020-03-08T10:15:04.903

15

I had the same problem with a different example. The mistake I made was to input the outcomes for a given threshold and not the probabilities in the argument y_score of roc_curve. It also gives a plot with three points but it is a mistake !

edited Mar 08 '20 at 10:15

answered Oct 19 '19 at 08:17

Tomas G.

3,784
25
28

Here is an example of how to finish plotting the roc curve: https://stackoverflow.com/a/67754984/670433 – s2t2 Aug 05 '23 at 16:20

score 5 · Answer 3 · edited Apr 28 '20 at 22:50

5

I ran into same problem, and after reading the documentaion carefully I realized that the mistake is in:

probas_ = model.predict_log_proba(X)

Although, there were hints pointed by others by checking the uniqueness. It should be instead:

probas_ = model.decisions(X)

edited Apr 28 '20 at 22:50

today

32,602
8
95
115

answered Apr 28 '20 at 22:03

Raman Khurana

125
1
7

The documentation's example also uses Y_score= model.decision(x_test) after fitting the model and passes Y_score to roc_curve. https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_display_object_visualization.html#sphx-glr-auto-examples-miscellaneous-plot-display-object-visualization-py – Neela Sep 01 '22 at 03:58

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

It's not necessary to get 1 point except (0,0) and (1,1). I'm using mushrooms dataset from kaggle for a binary classification problem. Procuring fpr and tpr from roc_curve, I'm getting 4 more points, though their value is more or less same.

fpr = {0, 0, 0.02290076, 0.0267176, 0.832061, 1}

tpr = {0, 0.0315361, 0.985758, 0.996948, 1, 1}

I'm not sure if we can consider this as 1 point because plotting the curve using this looks like the one shown in question.

Plotting a ROC curve in scikit yields only 3 points

4 Answers4

Linked