I have a duelling double deep Q network model which works with two dense layers and I trying to convert it inot two LSTM layers as my model deal with time series. When I change the dense layer in the code, this error appear and I was unable to deal with it. I know that this problem has been solved many times here, but these solutions aren't working.
The code that works with two dense layers is write as follow:
class DuelingDeepQNetwork(keras.Model):
def __init__(self, n_actions, fc1_dims, fc2_dims):
super(DuelingDeepQNetwork, self).__init__()
self.dense1 = keras.layers.Dense(fc1_dims, activation='relu')
self.dense2 = keras.layers.Dense(fc2_dims, activation='relu')
self.V = keras.layers.Dense(1, activation=None)
self.A = keras.layers.Dense(n_actions, activation=None)
def call(self, state):
x = self.dense1(state)
x = self.dense2(x)
V = self.V(x)
A = self.A(x)
Q = (V + (A - tf.math.reduce_mean(A, axis=1, keepdims=True)))
return Q
def advantage(self, state):
x = self.dense1(state)
x = self.dense2(x)
A = self.A(x)
return A
It works without error but when I turn the two first dense layers into LSTM as follow:
class DuelingDeepQNetwork(keras.Model):
def __init__(self, n_actions, fc1_dims, fc2_dims):
super(DuelingDeepQNetwork, self).__init__()
self.dense1 = keras.layers.LSTM(fc1_dims, activation='relu')
self.dense2 = keras.layers.LSTM(fc2_dims, activation='relu')
self.V = keras.layers.Dense(1, activation=None)
self.A = keras.layers.Dense(n_actions, activation=None)
This error appears:
Input 0 of layer lstm_24 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [64, 8]
Following this question "expected ndim=3, found ndim=2 I already tried to set the input shape using "state = state.reshape(64, 1, 8)" before run the neural network as follow:
def choose_action(self, observation):
if np.random.random() < self.epsilon:
action = np.random.choice(self.action_space)
else:
state = np.array([observation])
state = state.reshape(64, 1, 8) #<--------
actions = self.q_eval.advantage(state)
action = tf.math.argmax(actions, axis=1).numpy()[0,0]
return action
But I get the exact same error. I also tried to add the argument "return_sequences=True" in both layers but it didn't work aswell.
I don't know what to do and I have to hand in it in one week, someone to enlighten me?
EDIT
I'm using fc1_dims = 64, fc2_dims = 32 and n_actions = 2. The model uses 8 variables and have batch size of 64. I uploaded the code in github so you can execute it, if you want. The project is not finished so I will not write a proper read-me for now.
[github with code][2]