I have a priority queue of "door numbers". I get the next door number from the priority queue (i.e. the door with the lowest corresponding priority value), and then open the door. Behind the door, there may be a gift or not. Based on the presence / absence of a gift, I update the priority for this door number, and put it back into the priority queue. I then repeat, getting the next door number to open, and so on.
Assuming every door has a different gift-replenishment rate (i.e. some may get a new gift daily, others never at all), how should I update the priority values in order to maximize the number of gifts I find? That is, I want to maximize the ratio of doors I open with gifts to doors I open without gifts.
I should note that replenishment rates are not guaranteed to be fixed over time / there is random variation. But I'm okay with simplifying assumptions here.
This almost seems like a Monte-Carlo problem to me, except that the more often I explore a node (door), the lower its expected value. (And of course there's no tree to build; we only need to figure out the value of depth-1 nodes.)
The most trivial way is to keep track of last priority (LP) and current priority (CP), with delta = CP - LP. If we find a gift, set the next priority NP = CP + delta - 1; otherwise set NP = CP + delta + 1. This works I guess, but seems rather slow in its optimization.
Or we could have a multiplicative value instead: NP = CP + delta * shrink or NP = CP + delta * grow, where shrink < 1 and grow > 1. This is what I currently have, and it seemed to work fine for months, but now I'm getting the situation where some doors are being opened back-to-back (i.e. open door D, found gift, put back on priority queue, D is now best choice again, no gift found of course, now put back on queue with worse priority) which seems pretty bad. For reference, I used shrink = 0.9 and grow = 1.3.
Is there a math formula (as with Monte-Carlo) expressing the optimal way to explore doors?