0

I am trying to solve grid world problem using Value Iteration. The grid is a 2D array and has few walls and few terminal stages. I need to apply value iteration to each cell in the array. What value iteration does is, for each cell it calculates a utility value. Basically, I need to calculate the utility for each cell and I need to repeat the iterations until the Utility for each cell becomes stagnant. The Utility value is given by : U'(s) = max(R(s) + summation(T(s,a,s') * U(s')) Thus, if we are at cell [0,0] we can move in four directions: up, down, left and right. If we hit a wall, we don't move and out s' = s and thus U(s') = U(s) (for example if we try moving up from [0,0], we remain in the same position ie. same state s). On the other hand, if we try moving right from[0,0], we are free to move in that direction and thus we reach state S' = [0,1]. Similarly, we calculate value for each action and our new utility for a cell becomes the max value obtained. How do I write such a function in numpy so that I can write a apply that function to each cell and create a new array with new Utility values for each cell. Also, I need to stop when the all the utility values become stagnant. PS: I tried to vectorise the function as described in this post Efficient evaluation of a function at every cell of a NumPy array

However, I am looking for some way in which I can access the cell by index and not by values. It is because, for each cell I need to check whether it is a wall or a terminal stage. If it's a terminal stage, I need to skip calculating the utility values for that cell. Similarly for the wall cells, I need to skip calculating utilities for that cell since the agent can never reach that cell.

Please note: I tried calculating using python lists, but it's taking a lot of time to compute for 1,000,000 cells. Thus, I though of using numpy but I'm not proficient in numpy.

Also, to get a better idea about the grid world problem, you can have a look at the following image: https://image.slidesharecdn.com/luciomarcenarotuesummerschool-130913100859-phpapp01/95/lucio-marcenaro-tue-summerschool-39-638.jpg?cb=1379067018

user45437
  • 183
  • 3
  • 15
  • Did you read all the comments on the `vectorize` link? As a general rule, if you must iterate, and apply some function to each element, it's faster to work with lists. `numpy` methods can be fast, but they work with the whole array at once (well, iteratively, but in compiled code). To use `numpy` well you have to think in terms of the whole grid, not focusing on elements. – hpaulj Apr 11 '18 at 00:55
  • In that case, how do I calculate utility for large matrices? – user45437 Apr 11 '18 at 19:25

0 Answers0