0

I'm sure this has been answered, but I couldn't find anything that addressed my specific question.

I want to play with some reinforcement learning algorithms in a toy game. Given a particular game state, I want the network to predict action A. Given action A, I would then update the game state and call the network again on this new game state. This would repeat until the game ends.

I can't figure out, however, how to backpropagate the gradients through all of the network's predictions. Here's my (very rough) tensorflow pseudo code for how I imagine this would go:

for game_step in range(max_game_steps):
    action = sess.run(predict_action, feed_dict={game_state: current_state})
    current_state = update_state(current_state, action)
backpropagate_error_through_all_actions()

In this formulation, the gradients would only have context for their single action, and I can't send them through all of the states. I can't just run the network 30 times in a row, though, because I need to perform the state updates... What am I missing?

Do I need to model the entire toy game inside the tensorflow graph? That seems unreasonable.

Thanks!

Andrew Draganov
  • 676
  • 6
  • 18

1 Answers1

0

In reinforcement learning do not model your environment inside the tensorflow graph. For make it easy I will write down what should be inside the tensorflow graph

  1. Placeholders to take input state information

  2. policy network (If you use policy gradients methods) - This is your neural network

  3. Loss function

  4. Optimizer

Then in the outside loop keep your environment running. Now I will note down what kind of functions should be outside the graph

  1. A function which has the access to the environment that gives state representation after taking an action
  2. A function which has the access to the tensorflow graph which gives you the action after giving the new state representation to
    that function.
  3. A function Train that can call by outside loop which can run the optimizer inside the tensorflow graph.

Now you might be thinking what is the best way to access to tensorflow graph from the outside loop. Always use tf.get_default_session()

Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73
  • Right, that's more or less what I had written up in my pseudo code. My question still stands, though. How do I connect the gradients between predicting action A_t, applying action A_t to obtain a new state, and then predicting action A_(t+1)? How do I backpropagate through time if I have to keep popping out of the graph to apply the predictions? – Andrew Draganov Jul 25 '18 at 13:58
  • Realized it is a duplicate of this: https://stackoverflow.com/q/34536340/6095482. Thank you for your help. – Andrew Draganov Jul 25 '18 at 14:54