I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm.
I'd like a neural net to have the following structure:
- layer -
rows * cols + 1
neurons - input - values of consecutive fields on the board (0
for empty,1
or2
representing a player), action (natural number) in that state - layer - (??) neurons - hidden
- layer - 1 neuron - output - value of action in given state (float)
My first idea was to begin with creating a map of states, actions and values, and then try to teach neural network. If the teaching process would not succeed I could increase the number of neurons and start over.
However, I quickly ran into performance problems. Firstly, I needed to switch from simple in-memory Python dict
to a database (not enough RAM). Now the database seems to be a bottleneck (simply speaking there are so many possible states that the retrieval of actions' values is taking a noticeable amount of time). Calculations would take weeks.
I guess it would be possible to teach neural network on the fly, without the layer of a map in the middle. But how would I choose a right number of neurons on the hidden layer? How could I figure out that I'm loosing periously saved (learned) data?