Vowpal Wabbit unbalanced classes

Question

I'm trying to fit the model for binary classification and predict the probability of values belonging to these classes.

My first problem is that I can't interpret the results. I have a training set in whichlabels=0 and labels=1 (not -1 and +1).

I run the model:

vw train.vw -f model.vw --link=logistic

Then I have a file pred.txt with these values:

0.5 0.5111 0.5002 0.5093 0.5

I don't understand what mean 0.5? All value in pred.txt about 0.5. I wrote the script and deducted from results 0.5. I get this lines:

0 0.111 0.002 0.093 0

Is that my desired probability?

And here is my second problem - I have unbalanced target class. I have a 95% negative (0) and 5% positive results (1). How can I prescribe that VW made the imbalance of classes, like {class 0:0.1, class 1:0.9}?

Or it should be done when preparing dataset?

score 3 · Answer 1 · edited May 23 '17 at 12:07

For binary classification in VW, the labels need to be converted (from 0 and 1) to -1 and +1, e.g. with sed -e 's/^0/-1/'.

In addition to --link=logistic you need to use also --loss_function=logistic if you want to interpret the predictions as probabilities.

For unbalanced classes, you need to use importance weighting and tune the importance weight constant on heldout set (or cross-validation) with some external evaluation metric of your choice (e.g. AUC or F1).

Vowpal Wabbit unbalanced classes

1 Answers1