How to handle huge imbalance with rewards in contextual bandits in vowpal wabbit

Question

I am using vowpal wabbit to train and learn a contextual bandit algorithm. My use case is for email marketing, learning which email variants perform better. Since the email reward proportion will be very low - only 1% of the emails will be clicked (click here is the reward). How to handle this huge imbalance scenario in vowpal wabbit learning ?

With 1% of the observations having a reward (i.e, = -cost), the model is not able to learn anything even after running for longer durations. What are some options in vowpal wabbit that can help to address this, I am looking for syntax for options (to address imbalance in training) and examples in vowpal wabbit to address this, but couldn't find any.

It is independent of whether you use CB or not. This similar Q&A may help: https://stackoverflow.com/questions/24634602/how-to-perform-logistic-regression-using-vowpal-wabbit-on-very-imbalanced-datase — arielf, Mar 26 '22 at 18:58
@arielf I checked that out earlier, but didn't notice any examples in VW on where to use the weight value in the input, (like in what order should I input the weight of the observation, in the line of the observation) — tjt, Mar 27 '22 at 19:36
Not 100% sure, but weight (in traditional vw) and cost (in CB modes) are kind of opposites. I'm guessing you can use negative cost to emulate weights in CB. The answer above goes beyond weights. In particular is suggests downsampling the prevalent case, which i think is very relevant as a potential solution. — arielf, Mar 28 '22 at 20:34
FYI, There's an open issue for supporting weights natively for CB data. https://github.com/VowpalWabbit/vowpal_wabbit/issues/3472 — jackgerrits, Mar 31 '22 at 13:20
In particular this comment suggests an easy workaround for CB datasets: https://github.com/VowpalWabbit/vowpal_wabbit/issues/3472#issuecomment-982100213 Set the label probability as `label_prob = label_prob / desired_weight` — arielf, Apr 06 '22 at 02:35
@arielf Thank you. I have 7 emails variants (say 7 actions). And the click rate is just 1.5% where 98.5% emails are not clicked (click = cost = 0, no click = 1). Not sure how to use the label weight mentioned in the comments and apply it to my use case. Do you think, to address my issue, I can just upsample my clicks (like just train the data with clicks 10 times more, that way clicks will be about 15% in the data?) — tjt, Apr 06 '22 at 04:25
@tjt yes, something along the lines you describe. Experiment and see what works. One more thing: it seems that since this is a binary-outcome (click vs no-click) you might need to use `--loss_function logistic`. To understand why, see this answer: https://stackoverflow.com/questions/20461342/dealing-with-class-imbalance-in-multi-label-classification/22767594#22767594 — arielf, Apr 14 '22 at 21:23
@arielf Thank you so much that helps especially the loss function — tjt, Apr 14 '22 at 21:54

How to handle huge imbalance with rewards in contextual bandits in vowpal wabbit

0 Answers0