I'm trying to fit the model for binary classification and predict the probability of values belonging to these classes.
My first problem is that I can't interpret the results. I have a training set in whichlabels=0
and labels=1
(not -1 and +1
).
I run the model:
vw train.vw -f model.vw --link=logistic
Next:
vw test.vw -t -i model.vw -p pred.txt
Then I have a file pred.txt
with these values:
0.5
0.5111
0.5002
0.5093
0.5
I don't understand what mean 0.5? All value in pred.txt
about 0.5. I wrote the script and deducted from results 0.5. I get this lines:
0
0.111
0.002
0.093
0
Is that my desired probability?
And here is my second problem - I have unbalanced target class. I have a 95% negative (0) and 5% positive results (1). How can I prescribe that VW made the imbalance of classes, like {class 0:0.1, class 1:0.9}
?
Or it should be done when preparing dataset?