3

I am currently exploring PU learning. This is learning from positive and unlabeled data only. One of the publications [Zhang, 2009] asserts that it is possible to learn by modifying the loss function of an algorithm of a binary classifier with probabilistic output (for example Logistic Regression). Paper states that one should optimize Balanced Accuracy.

Vowpal Wabbit currently supports five loss functions [listed here]. I would like to add a custom loss function where I optimize for AUC (ROC), or equivalently, following the paper: 1 - Balanced_Accuracy.

I am unsure where to start. Looking at the code reveals that I need to provide 1st, 2nd derivatives and some other info. I could also run the standard algorithm with Logistic loss but trying to adjust l1 and l2 according to my objective (not sure if this is good). I would be glad to get any pointers or advices on how to proceed.

UPDATE More search revealed that it is impossible/difficult to optimize for AUC in online learning: answer

Community
  • 1
  • 1
Vladislavs Dovgalecs
  • 1,525
  • 2
  • 16
  • 26
  • 1
    John Langford confirmed that AUC can generally be optimized by changing the ratio of false positive and false negative loss. In VW, this means setting a different importance weight for positive and negative examples. You need to tune the optimal weight using a hold out set (or cross validation). – Martin Popel Oct 20 '14 at 12:59
  • @MartinPopel Thank you! I found that for my application SVM perf from T. Joachims does the job perfecly. I can use his linear SVM implementation where the custom loss function optimizes the criterion I am looking for. There is no need for a held out set (at least for setting the weights). – Vladislavs Dovgalecs Oct 20 '14 at 18:02

1 Answers1

2

I found two software suites that are immediately ready to do PU learning:

(1) SVM perf from Joachims

Use the ``-l 10'' option here!

(2) Sofia-ml

Use ``--loop_type roc'' option here!

In general you set +1'' labels to your positive examples and-1'' to all unlabeled ones. Then you launch the training procedure followed by prediction.

Both softwares give you some performance metrics. I would suggest to use standardized and well established binary from KDD`04 cup: ``perf''. Get it here.

Hope it helps for those wondering how this works in practice. Perhaps I prevented the case XKCD

Vladislavs Dovgalecs
  • 1,525
  • 2
  • 16
  • 26