Vowpal Wabbit inverted_hash option produces empty output, but why?

Question

I'm trying to get a vowpal wabbit model saved with inverted hashes. I have a valid model produced with the following:

vw --oaa 2 -b 24 -d mydata.vw --readable_model mymodel.readable

which produces a model file like this:

Version 7.7.0
Min label:-1.000000
Max label:1.000000
bits:24
0 pairs: 
0 triples: 
rank:0
lda:0
0 ngram: 
0 skip: 
options: --oaa 2
:0
66:0.016244
67:-0.016241
80:0.026017
81:-0.026020
84:0.015005
85:-0.015007
104:-0.053924
105:0.053905
112:-0.015402
113:0.015412
122:-0.025704
123:0.025704
...

(and so on for many thousands more features). However, to be more useful, I need to see the feature names. Seemed like a fairly obvious thing, but I did

vw --oaa 2 -b 24 -d mydata.vw --invert_hash mymodel.inverted

and it produced a model file like this (no weights are produced):

Version 7.7.0
Min label:-1.000000
Max label:1.000000
bits:24
0 pairs: 
0 triples: 
rank:0
lda:0
0 ngram: 
0 skip: 
options: --oaa 2
:0

It feels like I've obviously done something wrong, but I think I'm using the options in the documented way:

--invert_hash is similar to --readable_model, but the model is output in a more human readable format with feature names followed by weights, instead of hash indexes and weights.

Does anyone see why my second command is failing to produce any output?

Martin Popel · Accepted Answer · 2014-07-10T11:27:41.083

10

This is caused by a bug in VW which was fixed recently (on account of this question), see https://github.com/JohnLangford/vowpal_wabbit/issues/337.

By the way, it does not make sense to use --oaa 2. If you want binary classification (aka logistic regression), use --loss_function=logistic (and make sure your labels are 1 and -1). OAA makes sense only for N>2 number of classes (and it is recommended to use --loss_function=logistic with --oaa).

Also note that learning with --invert_hash is much slower (and requires more memory, of course). The recommended way how to create inverted-hash model, especially with multiple passes, is to learn a usual binary model and then convert it to inverted hash using one pass over the training data with -t:

vw -d mytrain.data -c --passes 4 -oaa 3 -f model.binary
vw -d mytrain.data -t -i model.binary --invert_hash model.humanreadable

edited Jul 10 '14 at 11:27

answered Jul 09 '14 at 17:35

Martin Popel

2,671
12
22

Interesting. Experimentally, I find results that could be interpreted to disagree with your point about `--oaa`. With only `-b 31`, it reports an average loss of .6. With `-b 31 --loss_function=logistic`, it reports a loss of .48 (and fiddling with `-l` and `--l1` and `--passes` don't really change it much). With `-b 31 --oaa 2 -c -k --passes 3 -l 0.25`, vw reports only a loss of .19. If `--oaa 2` doesn't make sense, then why does it perform so much better? I'm not trying to be disagreeable; I just want to understand. – Ben Collins Jul 10 '14 at 01:04
You should not compare apples (0/1 loss) and oranges (logistic or square loss). For OAA, VW reports always 0/1 loss. Without OAA (or other multiclass reductions), VW reports the loss you asked for (square, logistic, hinge...). – Martin Popel Jul 10 '14 at 10:29
1

You can force VW to report 0/1 loss for binary classification with `--binary`. – Martin Popel Jul 10 '14 at 10:37
I admit `--oaa 2` can give sometimes **slightly** better results. See http://stackoverflow.com/questions/24674880/effect-of-oaa-2-and-loss-function-logistic-in-vowpal-wabbit – Martin Popel Jul 10 '14 at 11:05
Hmm, ok. if I use `--loss_function=logistic --binary` together, it produces results very similar to what `--oaa 2` produces. – Ben Collins Jul 10 '14 at 13:04
I do `vw -d train.data -c --passes 10 --loss_function=logistic -f model.binary` This works. Then I do `vw -d train.data -t -i model.binary --invert_hash model.humanreadable`. I get this error `unrecognized options: --invert_hash model.humanreadable terminate called after throwing an instance of 'std::exception' what(): std::exception Aborted (core dumped). – Satarupa Guha May 08 '15 at 04:14

Vowpal Wabbit inverted_hash option produces empty output, but why?

1 Answers1