0

I have a created a binary classifier in Tensorflow that will output a generator object containing predictions. I extract the predictions (e.g [0.98, 0.02]) from the object into a list, later converting this into a numpy array. I have the corresponding array of labels for these predictions. Using these two arrays I believe I should be able to plot a roc curve via:

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve

fpr, tpr, thr = roc_curve(labels, predictions[:,1])
plt.plot(fpr, tpr)
plt.show()
print(fpr)
print(tpr)
print(thr)

Where predictions[:,1] gives the positive prediction score. However, running this code leads to only a flat line and only three values for each fpr, tpr and thr: Flat line roc plot and limited function outputs.

The only theory I have as to why this is happening is because my classifier is too sure of it's predictions. Many, if not all, of the positive prediction scores are 1.0, or incredibly close to zero:

[[9.9999976e-01 2.8635742e-07]
 [3.3693312e-11 1.0000000e+00]
 [1.0000000e+00 9.8642090e-09]
 ...
 [1.0106111e-15 1.0000000e+00]
 [1.0000000e+00 1.0030269e-09]
 [8.6156778e-15 1.0000000e+00]]

According to a few sources including this stackoverflow thread and this stackoverflow thread, the very polar values of my predictions could be creating an issue for roc_curve().

Is my intuition correct? If so is there anything I can do about it to plot my roc_curve?

I've tried to include all the information I think would be relevant to this issue but if you would like any more information about my program please ask.

Eden Trainor
  • 571
  • 5
  • 17

1 Answers1

0

ROC is generated by changing the threshold on your predictions and finding the sensitivity and specificity for each threshold. This generally means that as you increase the threshold, your sensitivity decreases but your specificity increases and it draws a picture of the overall quality of your predicted probabilities. In your case, since everything is either 0 or 1 (or very close to it) there are no meaningful thresholds to use. That's why the thr value is basically [ 1, 1, 1 ].

You can try to arbitrarily pull the values closer to 0.5 or alternatively implement your own ROC curve calculation with more tolerance for small differences.

On the other hand you might want to review your network because such result values often mean there is a problem there, maybe the labels leaked into the network somehow and therefore it produces perfect results.

Peter Szoldan
  • 4,792
  • 1
  • 14
  • 24
  • I get smooth (ish) training accuracy and loss curves in tensorboard so would that show that the labels aren't involved in the prediction results? I've used the tensorflow estimator API and the convolutional mnist tutorial on the tensorflow website as a framework for my code. Under the assumption that doing these things has allowed me to incorporate best practice with with constructing my network, if you wouldn't mind giving me your intution, how else would I go about troubleshooting these result values? – Eden Trainor Apr 02 '18 at 16:51
  • Unfortunately smooth curves or a normal-looking accuracy increase is no guarantee for anything. If the labels leaked into the network, it would still show a learning curve. I'm not sure what your classifying but what I'd do is to have some new test data the network has never seen before, and run that __without the labels__. Then get the results in a numpy array and compare those manually with a few lines of numpy. This takes about half an hour and will immediately show if there is an issue. – Peter Szoldan Apr 03 '18 at 00:37