N-sided die MDP problem Value Iteration Solution Needed

Question

I'm working on a problem for one of my classes. The problem is this: a person starts with $0 and rolls an N-sided dice (N could range from 1 to 30) and wins money according to the dice side they roll. X sides (ones) of the N-sided die result in the person losing all their money (current balance) and the game ending; for instance, if the die is [0,0,0,1,1,1,1], a person would receive $1 if they roll 1, $2 if they roll 2, or $3 if they roll 3, but they would lose everything if they rolled 4,5,6, OR 7.

What is the expected value for this N-sided dice problem? I tried value iteration but can't seem to get it right.

So for this dice [1,1,1,0,0,0,0], our first state (1 roll) expected value is 1/7*(4)+1/7*(5)+1/7*(6)+1/7*(7) = 3.1428

For the value iteration, next we have to calculate the value of state 4 (balance=$4), state 5 (balance=$5), state 6 (balance=$6), state 7 (balance=$7)

V(s) = Max_actions [Sum_probabilities[R(s)+V(s']]

V(4) = Max($4 {quit the game}, 1/7*(4+4)+1/7*(4+5)+1/7*(4+6)+1/7*(4+7) {keep playing}) -> 5.428

V(5) = Max($5 {quit the game}, 1/7*(5+4)+1/7*(5+5)+1/7*(5+6)+1/7*(5+7){keep playing}) -> 6

V(6) = Max($6 {quit the game}, 1/7*(6+4)+1/7*(6+5)+1/7*(6+6)+1/7*(6+7){keep playing}) -> 6.57

V(7) = Max($7 {quit the game}, 1/7*(7+4)+1/7*(7+5)+1/7*(7+6)+1/7*(7+7){keep playing}) -> 7.14

Now these V(4), V(5), V(6), and V(7)'s will branch out to their next states. So V(4) will become V(8), V(9), V(10), V(11), so on and so forth.

V(8) ($8 current< $7.74 expected), V(9) ($9 current <$8.28 expected), V(10)($10 current < $8.85 expected), V(11)($11 current<$9.42 expected), V(12)($12 current<$10 expected), V(13)($11 current<$10.57 expected), V(14)($14 current <$11.14 expected).

So, that suggests that V(8), V(9), V(10), V(11), V(12), V(13), V(14) are terminal states --> V(4), V(5), V(6), V(7) do not need to be changed.

Finally, we re-calculate the value of V(0) because the values of V(4), V(5), V(6), and V(7) were changed --> V(0) = 1/7* V(4)+1/7* V(5)+1/7* V(6)+1/7* V(7) => 3.59 ... This is the final expected reward for this game.

Does this make sense? I'm not looking for code to solve the problem, just some advice on whether this approach is correct.

Thanks

Edited based on comments below to make the post more concise.

When does the game stop? Does the player keep rolling until they lose all their balance? In that case, the expected value is 0. — Stef, Sep 10 '21 at 15:03
If the player decides by advance on a set number of turns T, and keeps rolling until either they have rolled T times, or they have lost their balance (whichever comes first), then the expected value is: (probability of not losing their balance) * (average value of one die roll knowing that it doesn't loose the balance) * T. — Stef, Sep 10 '21 at 15:05
If the player has a different strategy (such as deciding on a sum of money S and rolling until they either lose their balance or achieve sum S) then the expected value depends on the particular strategy. — Stef, Sep 10 '21 at 15:06
@stef, Thanks for the comment. So, the player can stop at any time. They don't need to keep playing until they lose the game. — biofree70, Sep 10 '21 at 15:37
@Stef I guess the question could be phrased a little differently: if the person starts with an initial balance=$0 and they play the game 1 round, the average reward they could get is $3.14, is this the best average reward they could achieve from this game? We can get an even better average reward if we roll again, wouldn't we? What is that average reward for this game considering all possible states. I think that frames the question a little bit better.. — biofree70, Sep 10 '21 at 15:50
Yes. In that case, if you just want to know the "immediate" expected reward of rolling once, knowing the current balance, you can just imagine that the bad faces of the die are worth negative balance. For instance with a six-sided dice and 3 bad faces, and an initial balance of 42€, then the expected reward for one roll is (1+2+3-42-42-42)/6. — Stef, Sep 10 '21 at 16:36
Correct, that is my understanding. But how does that generalize though? For one roll, I think it's straightforward, but as the rolls increase, it becomes like a tree with branches. Does what I said make sense for if we go to 2 rolls, or 3 rolls, etc etc.? — biofree70, Sep 10 '21 at 16:53
I actually didn't read your post in detail, because I was discouraged by the super long list of "V(5) = ...". But since die rolls are independent, if the number of die rolls is fixed in advance, you can use the formula `(expected value for T die rolls) = (probability of not losing their balance) * (average value of one die roll knowing that it doesn't loose the balance) * T - (probability of losing their balance) * (initial balance)` — Stef, Sep 10 '21 at 16:56
Note that to make sure that die rolls are independent is to assume that the player always rolls exactly T times, even if they on one of the first roll. This doesn't make any difference to the final result (if the player has lost on at least one roll, then they lose everything), but it avoids future dice rolls being dependent of earlier die rolls. — Stef, Sep 10 '21 at 16:57
As currently written this is a pure probability problem, so I've voted to close and migrate it to [cross validated](https://stats.stackexchange.com). — pjs, Sep 10 '21 at 17:19
Thanks @Stef, I made the post more concise, though I am having trouble trimming it down because I wanted to show my work. I'll keep looking into it. I don't think the formula you mentioned would apply here -- basically for a dice that's [1,1,1,0,0,0], the expected answer would be 2.58333 (starting with balance=$0), so the formula won't work I don't think. I'll update this post if I figure it out later... There's probably some property I failed to mention that is related to markov decision process that would influence the answer. — biofree70, Sep 10 '21 at 17:56
@pjs, okay, thank you. I am not that familiar with posting on Stackoverflow, so I appreciate any advice. — biofree70, Sep 10 '21 at 18:01
*" I don't think the formula you mentioned would apply here -- basically for a dice that's [1,1,1,0,0,0], the expected answer would be 2.58333 (starting with balance=$0), so the formula won't work I don't think."* Really? Why? — Stef, Sep 10 '21 at 18:43

score 0 · Answer 1 · answered Sep 10 '21 at 19:08

Yes, your approach is complex, but essentially correct. The expected winnings are 3 + 29/49 = 3.591836734693877551...

In general keep rolling if expected winnings exceed expected losses.

Expected losses if you have y money are y * X / N.

Expected winnings are avg(value of dice roll).

I would suggest using a dynamic programming approach for efficiency.

N-sided die MDP problem Value Iteration Solution Needed

1 Answers1