10

I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:

  1. dictionary for states and actions that are available for those states:

    SA = { 'state A': {' action 1', 'action 2', ..}, ...}

  2. dictionary for transition probabilities:

    T = {('state A', 'action 1'): {'state B': probability}, ...}

  3. dictionary for rewards:

    R = {('state A', 'action 1'): {'state B': reward}, ...}.

My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?

silvado
  • 17,202
  • 2
  • 30
  • 46
JackAW
  • 164
  • 2
  • 14

3 Answers3

10

I implemented Markov Decision Processes in Python before and found the following code useful.

http://aima.cs.berkeley.edu/python/mdp.html

This code is taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

  • 2
    Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackexchange.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Spontifixus Jan 11 '13 at 09:01
9

Whether a data structure is suitable or not mostly depends on what you do with the data. You mention that you want to iterate over the process, so optimize your data structure for this purpose.

Transitions in Markov processes are often modeled by matrix multiplications. The transition probabilities Pa(s1,s2) and the rewards Ra(s1,s2) could be described by (potentially sparse) matrices Pa and Ra indexed by the states. I think this would have a few advantages:

  • If you use numpy arrays for this, indexing will probably be faster than with the dictionaries.
  • Also state transitions could then be simply described by matrix multiplication.
  • Process simulation with for example roulette wheel selection will be faster and more clearly implemented, since you simply need to pick the corresponding column of the transition matrix.
silvado
  • 17,202
  • 2
  • 30
  • 46
  • Thank you very much for your comments. I will consider your approach at least in case of more complex MDPs to solve. – JackAW Dec 27 '12 at 19:51
0

There is an implementation of MDP with python called pymdptoolbox. It is developed based on the implementation with Matlab called MDPToolbox. Both are worth noting.

Basically, the probability transition matrix is represented as an (A × S × S) array , and rewards as an (S × A) matrix, where S and A represent number of states and number of actions. The package has some special treatment for sparse matrix as well.