Following the available docs and resources, it is not really clear how to accomplish a simple getting-started flow where you'd launch Vowpal Wabbit as a daemon (possibly even without any pre-learnt model) and have it online learn and explore ― I'm looking for a flow where I'd feed in a context, get back a recommendation, and feed back a cost/reward.
So let me skip the technical descriptions of what's been tried and simply ask for a clear demonstration regarding what I might consider essential in this vein ―
- How to demo through a daemon, that learning is taking place, not in offline mode from batch data but purely from online interaction? any good suggestions?
- How to report back a cost/reward following a selected action, in daemon mode? once per action? in bulk? and either way, how?
- Somewhat related ― would you recommend a live system using the daemon, for contextual bandits? or rather some of the language API?
- Can you alternatively point at where the server code sits inside the gigantic code base? it can be a good place to start systematically exploring from.
I typically get a distribution (the size of the number of allowed actions) as a reply for every input sent. Typically the same distribution regardless of what I sent in. Maybe it takes a whole learning epoch with the default --cb_explore
algorithm, I wouldn't know, and am not sure the epoch duration can be set from outside.
I understand that so much has been put into enabling learning from past interactions, and from cbfied data. However I think there should also be some available explanation clearing those more-or-less pragmatic essentials above.
Thanks so much!