I am trying to find the optimal threshold T of X to predict Y. I would normally use Youden's J in this setting, however when the threshold is a lower bound (in the case where Y varies inversely to X), the classic implementation does not seem to hold.
The following post has some partial answers (1st answers produces better results), but the method is not reliable according to the comments and no paper is cited: Roc curve and cut off point. Python
def cutoff_youdens_j(fpr, tpr, thresholds):
j_scores = tpr-fpr # J = sensivity (=tpr) + specificity (=1-fpr) - 1
j_ordered = sorted(zip(j_scores, thresholds))
return j_ordered[-1][1]
import numpy as np
from sklearn.metrics import roc_curve
X = np.arange(1, 10)
# Y is an example of a binary dependent variable that varies inversely to the predictor X
Y = X < 5
fpr, tpr, thresholds = roc_curve(Y, X)
T = cutoff_youdens_j(fpr, tpr, thresholds)
print(T)
# OUTPUT: 10
Expected output would be 5
, however I get 10
.
Are there any better methods for optimal threshold selection and is there a paper demonstrating this?
It would also be interesting to get if it actually is a lower or upper bound.
EDIT: A possibility would be the inverse X beforehand and then inverse T.
X = np.arange(1, 10)
Y = X < 5
X = -X
fpr, tpr, thresholds = roc_curve(Y, X)
T = cutoff_youdens_j(fpr, tpr, thresholds)
T = -T
print(T) #OUTPUT 4
This works, but the direction of the association has to be determined beforehand. Are there any other methods that work with both positive and negative associations between X and Y?