How to use custom loss function (PU Learning)

Question

I am currently exploring PU learning. This is learning from positive and unlabeled data only. One of the publications [Zhang, 2009] asserts that it is possible to learn by modifying the loss function of an algorithm of a binary classifier with probabilistic output (for example Logistic Regression). Paper states that one should optimize Balanced Accuracy.

Vowpal Wabbit currently supports five loss functions [listed here]. I would like to add a custom loss function where I optimize for AUC (ROC), or equivalently, following the paper: 1 - Balanced_Accuracy.

I am unsure where to start. Looking at the code reveals that I need to provide 1st, 2nd derivatives and some other info. I could also run the standard algorithm with Logistic loss but trying to adjust l1 and l2 according to my objective (not sure if this is good). I would be glad to get any pointers or advices on how to proceed.

UPDATE More search revealed that it is impossible/difficult to optimize for AUC in online learning: answer

John Langford confirmed that AUC can generally be optimized by changing the ratio of false positive and false negative loss. In VW, this means setting a different importance weight for positive and negative examples. You need to tune the optimal weight using a hold out set (or cross validation). — Martin Popel, Oct 20 '14 at 12:59
@MartinPopel Thank you! I found that for my application SVM perf from T. Joachims does the job perfecly. I can use his linear SVM implementation where the custom loss function optimizes the criterion I am looking for. There is no need for a held out set (at least for setting the weights). — Vladislavs Dovgalecs, Oct 20 '14 at 18:02

score 2 · Accepted Answer · answered Nov 05 '14 at 18:02

I found two software suites that are immediately ready to do PU learning:

(1) SVM perf from Joachims

Use the ``-l 10'' option here!

(2) Sofia-ml

Use ``--loop_type roc'' option here!

In general you set +1'' labels to your positive examples and-1'' to all unlabeled ones. Then you launch the training procedure followed by prediction.

Both softwares give you some performance metrics. I would suggest to use standardized and well established binary from KDD`04 cup: ``perf''. Get it here.

Hope it helps for those wondering how this works in practice. Perhaps I prevented the case XKCD

Did you find any implementation in R or Python? – hmi2015 Jul 31 '15 at 18:27 — hmi2015, Jul 31 '15 at 18:27

How to use custom loss function (PU Learning)

1 Answers1