2

I am in the process of writing an AI for the game 2048. At the moment, I can pull the game state from the browser and send moves to the game, but I don't know how to integrate that with TensorFlow. The nature of the project isn't conducive to training data, so I was wondering if it's possible to pass in the state of the game, have the network chuck out a move, run the move, repeat until the game is over, and then have it do the training?

Neywiny
  • 81
  • 2
  • 9

1 Answers1

1

This is certainly possible and trivial. You'll have to set up the model you want to use and I will assume that's been built.

From the perspective of interacting with a tensorflow model you just need to marshal your data into numpy arrays to pass in via the feed_dict property of sess.run.

To pass an input to tensorflow and get a result you would run something like this:

result = sess.run([logits], feed_dict={x:input_data})

This would perform a forward pass producing the output of the model without making any update. Now you'll take the results and use them to take the next step in the game.

Now that you have the result of your action (e.g. labels) you can perform an update step:

sess.run([update_op], feed_dict={x:input_data, y:labels})

It's as simple as that. Notice that your model will have an optimizer defined (update_op in this example), but if you don't ask tensorflow to compute it (as in the first code sample) no updates will occur. Tensorflow is all about a dependency graph. The optimizer is dependent on the output logits, but computing logits is not dependent on the optimizer.

Presumably you'll initialize this model randomly, so the first results will be randomly generated, but each step after that will benefit from the previous updates being applied.

If you're using a reinforcement learning model then you would only produce a reward at some indeterminant time in the future and when you run the update would vary a little from this example, but the general nature of the problem remains the same.

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • Okay thank you, I was wondering how something so trivial could be so elusive to my googling. So if I had 16 input neurons I would just make a dictionary of the 16 values and the variable "result" will just be the output neuron? This is very useful stuff, thank you. – Neywiny May 16 '18 at 23:28
  • A simple numpy vector of 16 values is your input. The output will be a just 1 value, again numpy ndarray datatype. Input/output to/from tensorflow happens in numpy format. – David Parks May 16 '18 at 23:31
  • right right, numpy. I'll give it a go when I can, thank you very much. – Neywiny May 16 '18 at 23:39
  • the optimizer is tripping me up. To my understanding that like of code should optimize the graph to get from input_data to labels, but I only have a score as the output, which is not dependent at all on the input data. I tried making a gradient descent minimizer with -1*score, but it keeps saying there's no gradients provided? – Neywiny May 17 '18 at 17:26
  • You need a loss function that compares the output that was produced by the network to the output you desired. That loss would be a function of both the networks output and your expected result. If you only have a score and not an expected output to compare against then you need to look into Reinforcement Learning. Check out the OpenAI Gym: https://gym.openai.com – David Parks May 17 '18 at 20:45
  • Thanks, I'll look into it. – Neywiny May 17 '18 at 22:10