Stateful LSTM Tensorflow Invalid Input_h Shape Error

Question

I am experimenting with stateful LSTM on a time-series regression problem by using TensorFlow. I apologize that I cannot share the dataset. Below is my code.

train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))

batch_size = 64

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))

model.compile(optimizer='adam',
              loss='mse',
              metrics=[tf.keras.metrics.RootMeanSquaredError()])

model.fit(train_feature, train_label, 
          epochs=10,
          batch_size=batch_size)

When I run the above code, after the end of the first epoch, I will get an error as follows.

InvalidArgumentError:  [_Derived_]  Invalid input_h shape: [1,64,50] [1,49,50]
     [[{{node CudnnRNN}}]]
     [[sequential_1/lstm_1/StatefulPartitionedCall]] [Op:__inference_train_function_1152847]

Function call stack:
train_function -> train_function -> train_function

However, the model will be successfully trained if I change the batch_size to 1, and change the code for model training to the following.

total_epochs = 10

for i in range(total_epochs):
    model.fit(train_feature, train_label, 
              epochs=1,
              validation_data=(val_feature, val_label),
              batch_size=batch_size,
              shuffle=False)

    model.reset_states()

Nevertheless, with a very large data (1 million rows), the model training will take a very long time since the batch_size is 1.

So, I wonder, how to train a stateful LSTM with a batch size larger than 1 (e.g. 64), without getting the invalid input_h shape error?

Thanks for your answers.

I updated my answer for clarity. It now uses the example code you provided in the question. I think that would be helpful for others that may have this same question in the future. Cheers. — Princy, Oct 20 '20 at 18:58

Princy · Answer 1 · 2020-10-20T19:11:42.163

The fix is to ensure batch size never changes between batches. They must all be the same size.

Method 1

One way is to use a batch size that perfectly divides your dataset into equal-sized batches. For example, if total size of data is 1500 examples, then use a batch size of 50 or 100 or some other proper divisor of 1500.

batch_size = len(data)/proper_divisor

Method 2

The other way is to ignore any batch that is less than the specified size, and this can be done using the TensorFlow Dataset API and setting the drop_remainder to True.

batch_size = 64

train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))

train_data = train_data.repeat().batch(batch_size, drop_remainder=True)

steps_per_epoch = len(train_feature) // batch_size 

model.fit(train_data, 
          epochs=10, steps_per_epoch = steps_per_epoch)

When using the Dataset API like above, you will need to also specify how many rounds of training count as an epoch (essentially how many batches to count as 1 epoch). A tf.data.Dataset instance (the result from tf.data.Dataset.from_tensor_slices) doesn't know the size of the data that it's streaming to the model, so what constitutes as one epoch has to be manually specified with steps_per_epoch.

Your new code will look like this:

train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))

batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))

model.compile(optimizer='adam',
              loss='mse',
              metrics=[tf.keras.metrics.RootMeanSquaredError()])

steps_per_epoch = len(train_feature) // batch_size 
model.fit(train_data, 
          epochs=10, steps_per_epoch = steps_per_epoch)

You can also include the validation set as well, like this (not showing other code):


batch_size = 64
val_data = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
val_data = val_data.repeat().batch(batch_size, drop_remainder=True)

validation_steps = len(val_feature) // batch_size 
model.fit(train_data, epochs=10, 
          steps_per_epoch=steps_per_epoch,
          validation_steps=validation_steps)

Caveat: This means a few datapoints will never be seen by the model. To get around that, you can shuffle the dataset each round of training, so that the datapoints left behind each epoch changes, giving everyone a chance to be seen by the model.

buffer_size = 1000 # the bigger the slower but more effective shuffling.

train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)

Why the error occurs

Stateful RNNs and their variants (LSTM, GRU, etc.) require fixed batch size. The reason is simply because statefulness is one way to realize Truncated Backprop Through Time, by passing the final hidden state for a batch as the initial hidden state of the next batch. The final hidden state for the first batch has to have exactly the same shape as the initial hidden state of the next batch, which requires that batch size stay the same across batches.

When you set the batch size to 64, model.fit will use the remaining data at the end of an epoch as a batch, and this may not have up to 64 datapoints. So, you get such an error because the batch size is different from what the stateful LSTM expects. You don't have the problem with batch size of 1 because any remaining data at the end of an epoch will always contain exactly 1 datapoint, so no errors. More generally, 1 is always a divisor of any integer. So, if you picked any other divisor of your data size, you should not get the error.

In the error message you posted, it appears the last batch has size of 49 instead of 64. On a side note: The reason the shapes look different from the input is because, under the hood, keras works with the tensors in time_major (i.e. the first axis is for steps of sequence). When you pass a tensor of shape (10, 15, 2) that represents (batch_size, steps_per_sequence, num_features), keras reshapes it to (15, 10, 2) under the hood.

Thanks for the very comprehensive explanation! I think now I understand the problem — glorian, Oct 22 '20 at 09:10
@glorian If the answer helped with the problem, please accept it and or upvote. If you have followup questions, feel free to ask. — Princy, Oct 23 '20 at 22:43
@Princy why do you do `train_data.repeat()` and `val_data.repeat()` in the first few code snippets? Does this ensure that the last elements not included in the first epoch are the first elements in the second epoch? — codeananda, Apr 15 '21 at 14:42
Also @Princy why repeat the dataset if you are going to drop the remainder? Does it not work if you keep the remainder? Then you would train on a slightly different dataset on each epoch. — codeananda, Apr 15 '21 at 14:47
@AdamMurphy the `repeat` method, invoked with the default argument `count=None`, makes the data to be streamed infinitely. And when to stop is then controlled via the arguments of the `fit` method of the model, in particular `steps_per_epoch` and `epoch` (training ends when the specified `epoch` is reached, and `steps_per_epoch` specify how many datapoints make up an epoch). This is just one approach to feed data to a tensorflow model. There are other ways that work just as fine, or some may be better suited for certain cases. You can still use `drop_remainder` without `repeat` — Princy, Apr 27 '21 at 21:59

Stateful LSTM Tensorflow Invalid Input_h Shape Error

1 Answers1

Method 1

Method 2

Why the error occurs