Use keras(TensorFlow) to build a Conv2D+LSTM model

Question

The data are 10 videos and each videos split into 86 frames and each frame has 28*28 pixels,

video_num = 10
frame_num = 86
pixel_num = 28*28

I want to use Conv2D+LSDM to build the Model, and at each time_steps(=frame_num=86) send the pixels data (=INPUT_SIZE=28*28) in the model.So the following is my code about the Model

BATCH_SIZE = 2 (just try)
TIME_STEPS=frame_num (=86)
INPUT_SIZE=pixel_num (=28*28)

model = Sequential()
model.add(InputLayer(batch_input_shape=(BATCH_SIZE, TIME_STEPS,     
INPUT_SIZE)))
print (model.output_shape)

model.add(TimeDistributed(Conv2D(64,(1,3),strides=(1,1), padding='same', 
data_format='channels_last')))  ##always the error here
print (model.output_shape)

model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2),padding='same')))
print (model.output_shape)

model.add(TimeDistributed(Conv2D(64,(1,3),strides=(1,1), 
data_format='channels_last', padding='same')))
print (model.output_shape)

model.add(TimeDistributed(MaxPooling2D(pool_size=(2,2),padding='same')))
print (model.output_shape)

model.add(TimeDistributed(Flatten()))
print (model.output_shape)

model.add(TimeDistributed(Dense(4096, activation='relu')))
print (model.output_shape)

model.add(LSTM(100, stateful=True, return_sequences=True))
print (model.output_shape)

model.add(Dense(1, activation='sigmoid'))
print (model.output_shape)

the following figure shows the error from command line

https://i.stack.imgur.com/m4VBx.jpg says "list index out of range"

I think that error is about the input shape in TimeDistributed() which gets the input from upper layer(InputLayer()), but I have no idea how to fix the error. I have tried to remove the InputLayer(), and use

TimeDistributed(Conv2D(...), input_shape=(TIME_STEPS, INPUT_SIZE))

as the first layer, but also get the same error...

If anyone know about this error, please share your idea, I will be very appreciate. Also, I still didn't very clear about the difference between batch_input_shape and input_shape, did anyone use these two before? Thanks.

Daniel Möller · Accepted Answer · 2017-11-24T11:34:05.743

11

A Conv2D layer requires four dimensions, not three:

(batch_size, height, width, channels).

And the TimeDistributed will require an additional dimension:

(batch_size, frames, height, width, channels)

So, if you're really going to work with TimeDistributed+Conv2D, you need 5 dimensions. Your input_shape=(86,28,28,3), or your batch_input_shape=(batch_size,86,28,28,3), where I assumed you've got an RGB video (3 color channels).

Usually, you just pass an input shape to the TimeDistributed.

model.add(TimeDistributed(Dense(....), input_shape=(86,28,28,3))

You will need the batch_input_shape only in the case of using stateful=True LSTM's. Then you just replace the input_shape with the batch_input_shape.

Notice that only the convolutional 2D layers will see images in terms of height and width. When you add the LSTM's, you will need to reshape the data to bring height, width and channels into a single dimension.

For a shape (frames, h, w, ch):

model.add(Reshape((frames,h*w*ch)))

And you should not use TimeDistributed with these LSTMs, only with the convolutional layers.

Your approach of using model.add(TimeDistributed(Flatten())) is ok instead of the reshape.

Notice also that Keras has recently implemented a ConvLSTM2D layer, which might be useful in your case: https://keras.io/layers/recurrent/#convlstm2d

edited Nov 24 '17 at 11:34

answered Nov 24 '17 at 11:23

Daniel Möller

84,878
18
192
214

Here I got another problem, how to make my input data in 5D form which also makes the last dimension "channel" – Edward Chang Nov 24 '17 at 13:22
frame_all_train[video_count][frame_count][pixel_height_count] [pixel_width_count] = frame[pixel_height_count,pixel_width_count] I can only form a 4D form with my data, I am not sure that how to add one more dimension because I don't know what value I should put into the 5th dimension... – Edward Chang Nov 24 '17 at 13:30
If you have a black and white video, for instance, then you have only one channel. In a numpy array, you can simply reshape it. Suppose you have an array `x_train` with shape `(10,86,28,28)`. Then: `x_train=x_train.reshape((10,86,28,28,1))`. – Daniel Möller Nov 24 '17 at 15:13
Oh! I see, Let me try and see if this works for me!! – Edward Chang Nov 24 '17 at 15:31
It works for me again!! Thank you soooo much!!! Do you know what dose "Stateful" mean? I saw someone on other posts discuss about that, and in my thought, I think "Stateful" means that the hidden state in LSTM can be updated one time in each batch and initialized after one batch, but I am not very sure would it be possible that I can make the hidden state deliver from the end of previous batch to the beginning of next batch (I don't want to initialize the hidden state anymore after starting training)? If it is impossible in using the LSTM, is there any other methods can make it possible – Edward Chang Nov 24 '17 at 16:01
See here :) - https://stackoverflow.com/questions/43882796/when-does-keras-reset-an-lstm-state – Daniel Möller Nov 24 '17 at 16:21
Do you guys have an example code? I am trying to use pretrained resnet before the LSTM, but getting dimension shape error. inputs = Input(shape=(frames, img_size, img_size,channels)) x = TimeDistributed(cnn_model)(inputs) x = Reshape((frames,150528))(x). (h*w*ch = 150528) x = TimeDistributed(Flatten())(x) x = LSTM(256)(x). When training, I receive the following error: ValueError: Error when checking input: expected input_5 to have 5 dimensions, but got array with shape (64, 224, 224, 3) – TheJokerAEZ Aug 04 '19 at 18:07

Use keras(TensorFlow) to build a Conv2D+LSTM model

1 Answers1

Linked