Is multiarm bandit a choice when there is very low reward

Question

Is any version of multiarm bandit (EpsilonGreedy, Thompson Sampling, UCB) any good when there is very low reward/click rate for the high pull rate. I have 600 piece of content with approximately 3000 clicks (total across all content) per day for a volume of approximately million requests. With this would it be useful to implement MAB, is this rate of click any statistical significance for the algorithm.

This question is off-topic on SO, try [ai.SE](https://ai.stackexchange.com/) or [stats.SE](https://stats.stackexchange.com). — cheersmate, Dec 11 '18 at 08:03
for click rate prediction, you could look at Factorization machines — Venkatachalam, Dec 11 '18 at 08:39

score 1 · Answer 1 · answered Feb 10 '20 at 07:32

Do the 600 pieces of content change every day or do they stay the same? If they stay the same, then an asymptotically optimal algorithm would start performing extremely well soon enough.

Even if the pieces of content change, Thompson Sampling should still work and give you something which significantly better than random. I have run various experiments with Thompson Sampling for my research and it seems to start doing well very quickly on most of them.

Is multiarm bandit a choice when there is very low reward

1 Answers1