I have a question regarding varying sequence lengths for LSTMs in Keras. I'm passing batches of size 200 and sequences of variable lengths (= x) with 100 features for each object in the sequence (=> [200, x, 100]) into a LSTM:
LSTM(100, return_sequences=True, stateful=True, input_shape=(None, 100), batch_input_shape=(200, None, 100))
I'm fitting the model on the following randomly created matrices:
x_train = np.random.random((1000, 50, 100))
x_train_2 = np.random.random((1000, 10,100))
As far as I understood LSTMs (and the Keras implementation) correctly, the x should refer to the number of LSTM cells. For each LSTM cell a state and three matrices have to be learned (for input, state and output of the cell). How is it possible to pass varying sequence lengths into the LSTM without padding up to a max. specified length, like I did? The code is running, but it actually shouldn't (in my understanding). It's even possible to pass another x_train_3 with a sequence length of 60 afterwards, but there shouldn't be states and matrices for the extra 10 cells.
By the way, I'm using Keras version 1.0.8 and Tensorflow GPU 0.9.
Here is my example code:
from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
from keras import backend as K
with K.get_session():
# create model
model = Sequential()
model.add(LSTM(100, return_sequences=True, stateful=True, input_shape=(None, 100),
batch_input_shape=(200, None, 100)))
model.add(LSTM(100))
model.add(Dense(2, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
# Generate dummy training data
x_train = np.random.random((1000, 50, 100))
x_train_2 = np.random.random((1000, 10, 100))
y_train = np.random.random((1000, 2))
y_train_2 = np.random.random((1000, 2))
# Generate dummy validation data
x_val = np.random.random((200, 50, 100))
y_val = np.random.random((200, 2))
# fit and eval models
model.fit(x_train, y_train, batch_size=200, nb_epoch=1, shuffle=False, validation_data=(x_val, y_val), verbose=1)
model.fit(x_train_2, y_train_2, batch_size=200, nb_epoch=1, shuffle=False, validation_data=(x_val, y_val), verbose=1)
score = model.evaluate(x_val, y_val, batch_size=200, verbose=1)