1

Suppose I have classification decisions and confidences as records, is there any way to get precision/recall curve through SQL? In other words, is there a way to collapse the following set of queries into one?

select avg(predicted_label=correct_label) from t where confidence<0
select avg(predicted_label=correct_label) from t where confidence<1
select avg(predicted_label=correct_label) from t where confidence<2
select avg(predicted_label=correct_label) from t where confidence<3
select avg(predicted_label=correct_label) from t where confidence<4
select avg(predicted_label=correct_label) from t where confidence<5
select avg(predicted_label=correct_label) from t where confidence<6
....
Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197

2 Answers2

1

To get avg of predicted_label for ranges of confidence try this.
EDIT: Did a join on <=

SELECT FLOOR(t1.confidence), AVG(t2.predicted_label)
FROM Table t1
    JOIN Table t2 ON t2.confidence <= t1.Confidence
GROUP BY FLOOR(t1.confidence)
user2989408
  • 3,127
  • 1
  • 17
  • 15
  • Interesting, wasn't over of "join on a<=b" syntax, which engines support that? – Yaroslav Bulatov Jan 24 '14 at 18:30
  • @YaroslavBulatov AFAIK all db engines support such joins. Make sure the comparison column is indexed for better performance. Here is an [Oracle](http://docs.oracle.com/cd/B25016_08/doc/dl/bi/B13916_04/joins.htm#i1014451) reference. Also [this](http://stackoverflow.com/questions/7870155/difference-between-a-theta-join-equijoin-and-natural-join) might help. – user2989408 Jan 24 '14 at 20:52
  • I see. It doesn't seem to be in Google Cloud SQL (yet) – Yaroslav Bulatov Jan 26 '14 at 18:35
1

If you have a table (let's call it numbers with single column num) containing the relevant numbers from 0 to some large enough number, then you can simply say:

select avg(t.predicted_label=t.correct_label) 
  from t join numbers on t.confidence < numbers.num
 where numbers.num < 100 -- or whatever makes sense
group by numbers.num
FrankPl
  • 13,205
  • 2
  • 14
  • 40