I am working on ranking online content based on customer feedback for my college project. For that, I associate each content with a prior alpha and beta parameter and update those based on the feedback I get. As I simulate more and more trials, the values for alpha and beta parameters keep on increasing. I want my model to be more reactive to the recent customer behavior so in my updates, I decay prior parameters by a factor of 0.9 and sum the alpha, beta from the last day (as a first order inhomogeneous linear difference equation).
Due to the decay, the model forgets that some content was suboptimal and tries to explore it again leading to some cyclic behavior. Is there any better way to solve this? I tried just looking at last month of data to build my distribution but that seems to be "forgetful" too. How do I prevent alpha/beta from getting too large, while ensuring the model is reactive and doesn't forget suboptimal strategies?