0

I was trying to optimize my Code when I encountered this strange behaviour with the following model in Keras:

# random_minibatches of form (state, action, reward, state_next, done)
random_minibatch = random.sample(list_of_samples, batch_size)

# A state is a list in the form of [x, y]
next_states = [temp[3] for temp in random_minibatch]
# Reshape the next_states for the model
next_states = np.reshape(next_states, [-1, 2])
next_states_preds = model.predict(next_states)

for i, (_, _, _, state_next, _) in enumerate(random_minibatch):
    state_next= np.reshape(state_next, [1, 2])
    pred = model.predict(state_next)

    print("inputs: {} ; {}".format(next_states[i], state_next))
    print(pred)
    print(next_states_preds[i])
    print("amax: {} ; {}".format(np.amax(pred), np.amax(next_states_preds[i])))
    print()

and a simple model:

model = Sequential()

model.add(layers.Dense(16, activation="relu", input_dim=2))
model.add(layers.Dense(32, activation="relu"))
model.add(layers.Dense(8))

model.compile(loss="mse", optimizer=Adam(lr=0.00025))

next_states is a list of lists in the form of [[x1, y1], [x2, y2], ...]
and state_next is a list in the form of [x, y]

As you can see, next_states contains every state_next in the for-loop and the input for my model is the same. The only difference is that in the first time I put the whole list of lists in the model and the second time I put the lists in one by one.

My Problem is that I get different outputs from the same input.
An example of the printed output would be:

inputs: [39 -7] ; [39 -7]
[0. 0. 0. 0. 0. 5.457102 0. 0.]
[[0. 0. 0. 0. 0. 5.4571013 0. 0.]]
amax: 5.457101345062256 ; 5.457101821899414

So at this point I'm not sure if I misunderstood something or just did something wrong somewhere? I would be very glad if someone could help me with that strange behaviour.

Henning
  • 545
  • 1
  • 8
  • 16

0 Answers0