1

I am trying to run logistic regression on a sample data in vowpal. I created a sample data set look like this:

 1 1.0  | a:3.28 b:1.5 c:2.0  |example
-1 1.0  | a:1.25 b:0.4 c:1.4  |example
 1 1.0  | a:1.40 b:0.8 c:1.6  |example
 1 1.0  | a:2.00 b:4.2 c:2.1  |example
-1 1.0  | a:2.51 b:2.7 c:1.9  |example
 1 1.0  | a:1.72 b:2.3 c:0.6  |exampleone
 1 1.0  | a:1.81 b:2.1 c:0.9  |example

when I tried to run logistic its showing a error of "you are using label 0 not -1 or 1 as specified by the loss function experts or malformed example"

After this I want to calculate the score at end and how to calculate the score or auc curve in vowpal

albert
  • 8,112
  • 3
  • 47
  • 63
user3456
  • 45
  • 6

1 Answers1

2

Make sure to use the correct input data format for Vowpal Wabbit.

The error "you are using label 0" occurs if you use --loss_function=logistic (or --loss_function=hinge) and some of your examples have label 0. I cannot reproduce the error with the sample you provided.

The "|example" in your sample is interpreted as a namespace with no features, which is probably not what you wanted. The "1.0" is interpreted as example importance weight, but 1.0 is the default importance weight, so you can omit it. If you want to use tags, they must be before the first vertical bar (without any space before the bar). So the sample should look like:

1 tag1| a:3.28 b:1.5 c:2.0 -1 tag2| a:1.25 b:0.4 c:1.4 1 tag3| a:1.40 b:0.8 c:1.6 1 tag4| a:2.00 b:4.2 c:2.1 -1 tag5| a:2.51 b:2.7 c:1.9 1 tag6| a:1.72 b:2.3 c:0.6 1 tag7| a:1.81 b:2.1 c:0.9

calculate the score at end and how to calculate the score or auc

What score? VW computes progressive validation loss (or holdout loss if you use multiple passes and don't use --holdout_off). If you want to compute area under ROC curve you must use some external tool, e.g. perf. See Calculating AUC when using Vowpal Wabbit.

Community
  • 1
  • 1
Martin Popel
  • 2,671
  • 12
  • 22
  • Thanks for the suggestion.Perf is not available for mac. I have changed the data and got the predictions file while testing . Are those values the probabilities of each user. – user3456 May 27 '15 at 22:22
  • If you want to interpret the scores in the prediction file (`-p file`) as probabilities, you must use `--loss_function=logistic --link=logistic`. Note that for computing area under ROC curve you don't need to convert the predictions to probabilities via the logistic link function -- it is a monotonic function. – Martin Popel May 28 '15 at 09:17
  • By using the --link =logistic the output file has probabilities. Can they be used for calculating the accuracy of the logistic regression? – user3456 May 29 '15 at 17:12
  • What do you mean by accuracy? VW reports logistic loss (if you have binary classification and use `--loss_function=logistic`). If you want to see 0/1 loss (i.e. "one minus accuracy"), use `--binary` (but the predictions will be also converted to -1 and +1). – Martin Popel May 30 '15 at 01:17
  • By accuracy or score I mean if I implement both logistic regression and other classification algorithm to compare which fits the data better we can use the accuracy of prediction. I have seen i different kaggle competitions calculating scores at the end, so I am interested to know about that. I may be wrong with the terminology – user3456 Jun 02 '15 at 00:08