I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:
dictionary for states and actions that are available for those states:
SA = { 'state A': {' action 1', 'action 2', ..}, ...}
dictionary for transition probabilities:
T = {('state A', 'action 1'): {'state B': probability}, ...}
dictionary for rewards:
R = {('state A', 'action 1'): {'state B': reward}, ...}
.
My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?