I was trying to optimize my Code when I encountered this strange behaviour with the following model in Keras:
# random_minibatches of form (state, action, reward, state_next, done)
random_minibatch = random.sample(list_of_samples, batch_size)
# A state is a list in the form of [x, y]
next_states = [temp[3] for temp in random_minibatch]
# Reshape the next_states for the model
next_states = np.reshape(next_states, [-1, 2])
next_states_preds = model.predict(next_states)
for i, (_, _, _, state_next, _) in enumerate(random_minibatch):
state_next= np.reshape(state_next, [1, 2])
pred = model.predict(state_next)
print("inputs: {} ; {}".format(next_states[i], state_next))
print(pred)
print(next_states_preds[i])
print("amax: {} ; {}".format(np.amax(pred), np.amax(next_states_preds[i])))
print()
and a simple model:
model = Sequential()
model.add(layers.Dense(16, activation="relu", input_dim=2))
model.add(layers.Dense(32, activation="relu"))
model.add(layers.Dense(8))
model.compile(loss="mse", optimizer=Adam(lr=0.00025))
next_states
is a list of lists in the form of [[x1, y1], [x2, y2], ...]
and state_next
is a list in the form of [x, y]
As you can see, next_states
contains every state_next
in the for-loop and the input for my model is the same. The only difference is that in the first time I put the whole list of lists in the model and the second time I put the lists in one by one.
My Problem is that I get different outputs from the same input.
An example of the printed output would be:
inputs: [39 -7] ; [39 -7]
[0. 0. 0. 0. 0. 5.457102 0. 0.]
[[0. 0. 0. 0. 0. 5.4571013 0. 0.]]
amax: 5.457101345062256 ; 5.457101821899414
So at this point I'm not sure if I misunderstood something or just did something wrong somewhere? I would be very glad if someone could help me with that strange behaviour.