0

I am trying to find the best predictor for a binary outcome.

For each case, A < X and B > Y% for C = Z%. A and B are variables that are linked (Dose X to a volume Y%). C is how often this is seen in each case.

I then have different thresholds of C which I can use to predict a binary outcome (P). I also have (O) - the true outcome (binary).

I am looking at multiple X, Y and C values which best match P to O.

So for each combination of X (4 discrete points) and Y (10-90% in 10% intervals) I have a result C (%). For different thresholds of C (10-90% in 10% intervals) I have the number of cases correctly predicted and also the 2x2 confusion matrix and the sensitivity and (1-specificity).

In terms of statistics I think I can use ROC curves to find the best predictors? But I'm not sure if that's true or if I can simply compare all the combinations or just different thresholds of C of each X, Y? Or the different X,Y for the same threshold of C? Or if I should be doing a different statistical test?

But assuming I'm doing ROC curves. I plotted all the points in MATLAB (scatter) and the line y=x (refline = (1,0)). So I know the points that matter are the ones above the diagonal but how would I then fit the actual ROC curve to calculate the AUC?

I know this is confusing so I hope it makes sense!

edit: What I'm currently thinking is I need an ROC curve for the different thresholds of C for each X, Y combination. In each of these cases the best C threshold is the point nearest 1,0 (how do I find that? nearest geometrically?). And then I compare the AUC for each X,Y combination and the one with the largest area is the best?

browser
  • 313
  • 1
  • 3
  • 12
  • Your post is indeed a bit hard to follow, but reading it slowly I think I've understood. Maybe if you posted some parts of the code, would be easier. Your first question about ROC: you can use it for setting your threshold for your classification outcome, but that has nothing to do with `C`, which, if I understood correctly, is one of your features. ROC helps you decide visually in the compromise for your classification outcome: if you set your threshold low, you will get more false negatives, and the opposite if you set it high. Have I understood anything? What's the relation with `C`? – lrnzcig Oct 18 '15 at 14:48
  • I have two features that I am adjusting (A (dose) and B (volume)). The combination of applying A and B to the data yields a result C. i.e. I check A =10, B=20. In each case I have a result C=Z%. I then threshold C in order to predict my binary outcome. So check C for <10%, <20%, ... etc. You can think of C as the extent to which A and B are seen in each case. – browser Oct 19 '15 at 10:38
  • What I have done so far is plot (using scatter) each sens vs (1-spec) for all combinations of A and B and all thresholds of C. A simple question first: From the individual points, how do I add the actual curve on top of that, or should I draw it as a step function? – browser Oct 19 '15 at 10:41

1 Answers1

0

I have answered similar question on these links: ROC curve and libsvm and MATLAB - generate confusion matrix from classifier
Please, go through it and let me know your doubts.

Community
  • 1
  • 1
Vikrant Karale
  • 147
  • 1
  • 15