How does the dimensions work when training a keras model?

Question

Getting:

    assert q_values.shape == (len(state_batch), self.nb_actions)
AssertionError
q_values.shape <class 'tuple'>: (1, 1, 10)
(len(state_batch), self.nb_actions) <class 'tuple'>: (1, 10)

which is from the keras-rl library of the sarsa agent:

rl.agents.sarsa.SARSAAgent#compute_batch_q_values

    batch = self.process_state_batch(state_batch)
    q_values = self.model.predict_on_batch(batch)
    assert q_values.shape == (len(state_batch), self.nb_actions)

Here is my code:

class MyEnv(Env):

    def __init__(self):
        self._reset()

    def _reset(self) -> None:
        self.i = 0

    def _get_obs(self) -> List[float]:
        return [1] * 20

    def reset(self) -> List[float]:
        self._reset()
        return self._get_obs()



    model = Sequential()
    model.add(Dense(units=20, activation='relu', input_shape=(1, 20)))
    model.add(Dense(units=10, activation='softmax'))
    logger.info(model.summary())

    policy = BoltzmannQPolicy()
    agent = SARSAAgent(model=model, nb_actions=10, policy=policy)

    optimizer = Adam(lr=1e-3)
    agent.compile(optimizer, metrics=['mae'])

    env = MyEnv()
    agent.fit(env, 1, verbose=2, visualize=True)

Was wondering if someone can explain to me how the dimensions should be set up and how it works with the libraries? I'm putting in a list of 20 inputs, and want an output of 10.

score 3 · Answer 1 · answered Jul 21 '19 at 09:48

3

This particular error is caused by your input shape being (1, 20). If you use an input shape of (20,) the error will go away.

In other words SARSAAgent expects a model that outputs tensors with 2-dimensions (batch_size, nb_actions). And your model is outputting a shape of (batch_size, 1, 10). You can either reduce the dimensions in the input of the model or Flatten the output.

answered Jul 21 '19 at 09:48

Pedro Marques

2,642
1
10
10

Keras then complain with `ValueError: Error when checking input: expected dense_1_input to have 2 dimensions, but got array with shape (1, 1, 20)`. I'll look into the flatten – Tjorriemorrie Jul 21 '19 at 21:29
1

When you change the input shape to (20,) you need to change the numpy arrays that you feed to ```model.fit``` to the corresponding shape. Using an input shape of (1, 20,) adds more complexity over the standard (20,). i.e. the extra dimension with 1 isn't not providing extra value and created this issue in the first place. – Pedro Marques Jul 22 '19 at 07:52

mujjiga · Accepted Answer · 2019-07-28T13:03:43.170

Custom environment

Let first build a simple toy environment first

Its is a 1D maze : [1,1,0,1,1,0,1,1,0]
1: Stepping into this block of maze will get a reward of 1
0: Stepping into this block of maze will result in death with 0 reward
Allowed actions 0: Move to next block of maze, 1: Hop over then next block, i.e skip the next and move to the one next to the next block of maze

To implement our env in gym we need to implement 2 methods

step: Takes in a actions and performs the step and returns the state after step take, reward and a bool representing if the game has ended or not
reset: Reset the game and return the current state (initial state)

Env Code

class FooEnv(gym.Env):
    def __init__(self):
        self.maze = [1,1,0,1,1,0,1,1,0]
        self.curr_state = 0
        self.action_space = spaces.Discrete(2)
        self.observation_space = spaces.Discrete(1)

    def step(self, action):        
        if action == 0:
            self.curr_state += 1
        if action == 1:
            self.curr_state += 2

        if self.curr_state >= len(self.maze):
            reward = 0.
            done = True
        else:
            if self.maze[self.curr_state] == 0:
                reward = 0.
                done = True
            else:
                reward = 1.
                done = False
        return np.array(self.curr_state), reward, done, {}

    def reset(self):
        self.curr_state = 0
        return np.array(self.curr_state)

Neural Network

Now given the current state we want NN to predict the action to be taken.

NN will take current sate which is a single number representing the current maze block we are in as input
NN will return one of the two possible actions 0 or `1

NN Code

model = Sequential()
model.add(Dense(units=16, activation='relu', input_shape=(1,)))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=2, activation='softmax'))

Putting it together

policy = BoltzmannQPolicy()
agent = SARSAAgent(model=model, nb_actions=2, policy=policy)

optimizer = Adam(lr=1e-3)
agent.compile(optimizer, metrics=['acc'])

env = FooEnv()
agent.fit(env, 10000, verbose=1, visualize=False)
# Test the trained agent using
# agent.test(env, nb_episodes=5, visualize=False)

Output

Training for 10000 steps ...
Interval 1 (0 steps performed)
10000/10000 [==============================] - 54s 5ms/step - reward: 0.6128
done, took 53.519 seconds

If your environment is a Grid (2D) say if size n X m then the input size of NN will be (n,m) like below and flatten it before passing to the Dense layers

model.add(Flatten(input_shape=(n,m))

Check this example from keras-rl docs