1

I'm doing an AI with reinforcement learning and i'm getting weird results, the loss shows like this: Tensorflow loss: https://i.stack.imgur.com/hispR.jpg

And while it's training, after each game, it's playing against a random player and after a player with a weighted matrix, but it goes up and down: results: https://i.stack.imgur.com/mtWiS.jpg

Basically i'm doing a reinforcement learning agent that learns to play Othello. Using E-greedy, Experience replay and deep networks using Keras over Tensorflow. Tried different architectures like sigmoid, relu and in the images shown above, tanh. All them have similar loss but the results are a bit different. In this exemple the agent is learning from 100k professional games. Here is the architecture, with default learning rate as 0.005:

model.add(Dense(units=200,activation='tanh',input_shape=(64,)))
model.add(Dense(units=150,activation='tanh'))
model.add(Dense(units=100,activation='tanh'))
model.add(Dense(units=64,activation='tanh'))
optimizer = Adam(lr=lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(loss=LOSS,optimizer=optimizer)

Original code: https://github.com/JordiMD92/thellia/tree/keras

So, why i get these results? Now my input is 64 neurons (8*8 matrix), with 0 void square, 1 black square and -1 white square. Is it bad to use negative inputs?

1 Answers1

1

It might be your activate function's problem. Try to use relu instead of tanh, and if you are using the deep q learning, you might dont need any activate function or take care about the optimizer which reset the weights.

hdD
  • 81
  • 4
  • I will try that. Do you think that the number of layers and neurons seem correct for this problem? – user2335427 Dec 26 '17 at 21:33
  • @user2335427 I think its up to your model with your design. Cuz, the neurons will store the information of each input data for the deep q learning, so you might consider all the possible state. I always saperate different kinds of information in different layers. For example, you can store states in a layer and actions in other one. – hdD Dec 27 '17 at 01:50
  • I got rid of keras and now i'm using only Tensorflow. Implemented the Double DQN and tried with different activate functions and by now relu is doing well. Thanks. About the last question: ¿Is it bad to use negative inputs? – user2335427 Dec 30 '17 at 23:17
  • @user2335427 np. And for your last question, actually, if your inputs are neg, you just use it, or the neg input is exceptional input, you need use other method to decrease the affect for the model. The negative inputs just influence the performance of you model in my opinion. – hdD Dec 31 '17 at 04:38
  • @user2335427 For the deep q learning or reinforcement learning, the negative input just means in this state by the action will cause the entire rewards will decrease, so when your model will occur the same state and action as describe in the Markov chain, the model will select other action to cause the reward largeer. So if you feed the negative input into the model and wanna get the action which you feed the neg, you might need use more time to make the model to learn to find a good process which will decrease the affect of negative input for the state rewards. – hdD Dec 31 '17 at 04:38
  • I see, i will try to modify the model. Thanks – user2335427 Dec 31 '17 at 09:00