In vowpal wabbit name-spaces are used in order to conveniently generate interaction features on-the-fly during run-time without the need to predeclare them.
A simple example format, without a name space is:
1 | a:2 b:3
where 1
is the label, and a
, b
are regular input features.
Note that there's a space after the |
.
Contrast the above with using two name spaces x
and y
(note no space between the |
separator and the name-spaces):
1 |x a:2 |y b:3
This example is essentially equivalent (except for feature hash locations) to the first example. It still has two features with the same values as the original example. The difference is that now with these name-spaces, we can cross features by passing options to vw
. For example:
vw -q xy
will generate additional features on-the-fly by crossing all features in name-space x
with all features in name-space y
. The names of the auto-generated features will be the concatenation of the names from the two name-spaces and the values will be the products of their respective values. In this particular case, it would be as if our data-set had one additional feature: ab:6
(*)
Obviously, this is a very simple example, imagine that you have an example with 3 features in a name-space:
1 |x a:2 b:3 c:5
By adding -q xx
to vw
you could automatically generate 6 additional interaction features: aa, ab, ac, bb, bc, cc
on the fly. And if you had 3 name-spaces, say: x, y, z
, you could cross any (or any wanted subset) of them: -q xx -q xy -q xz -q yz -q yy -q zz
on the command-line to get all possible interactions between the separate sets of features.
That's all there is to it. It is a powerful feature allowing you to experiment and add interaction features on the fly.
There are several options which accept (1st letters of) name-spaces as arguments, among them:
-q
--cubic
--ignore
--keep
--redefine (very new)
--lrq
Check out the vw command line arguments wiki for more details.
(*) In practice, the feature names will have the name spaces prepended to them with a ^
separator in between so the actual hashed string would be x^a^y^b:6
rather than ab:6
(You may verify this by using the --audit
option) but this is just a detail.