I have 950 training video samples and 50 testing video samples. Each video sample has 10 frames and each frame has a shape of (n_row=28, n_col=28, n_channels=1). My inputs (x) and outputs (y) have same shapes.
x_train shape: (950, 10, 28, 28,1),
y_train shape: (950, 10, 28, 28,1),
x_test shape: (50, 10, 28, 28,1),
y_test shape: (50, 10, 28, 28,1).
I want to give input video samples (x) as input to my model to predict output video samples (y).
My model so far is:
from keras.layers import Dense, Dropout, Activation, LSTM
from keras.layers import Convolution2D, MaxPooling2D, Flatten, Reshape
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
import numpy as np
########################################################################################
model = Sequential()
model.add(TimeDistributed(Convolution2D(16, (3, 3), padding='same'), input_shape=(None, 28, 28, 1)))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))
model.add(TimeDistributed(Convolution2D(32, (3, 3), padding='same')))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(Dropout(0.2))
model.add(TimeDistributed(Convolution2D(64, (3, 3), padding='same')))
model.add(Activation('sigmoid'))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(LSTM(64, return_sequences=True, stateful=False))
model.add(Activation('sigmoid'))
model.add(Dense(784, activation='sigmoid'))
model.add(Reshape((-1, 28,28,1)))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
print(model.summary())
summary of the model is:
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, None, 28, 28, 16) 160
_________________________________________________________________
activation_1 (Activation) (None, None, 28, 28, 16) 0
_________________________________________________________________
time_distributed_2 (TimeDist (None, None, 14, 14, 16) 0
_________________________________________________________________
dropout_1 (Dropout) (None, None, 14, 14, 16) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 14, 14, 32) 4640
_________________________________________________________________
activation_2 (Activation) (None, None, 14, 14, 32) 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, None, 7, 7, 32) 0
_________________________________________________________________
dropout_2 (Dropout) (None, None, 7, 7, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, None, 7, 7, 64) 18496
_________________________________________________________________
activation_3 (Activation) (None, None, 7, 7, 64) 0
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 3, 3, 64) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, None, 576) 0
_________________________________________________________________
lstm_1 (LSTM) (None, None, 64) 164096
_________________________________________________________________
lstm_2 (LSTM) (None, None, 64) 33024
_________________________________________________________________
activation_4 (Activation) (None, None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, None, 784) 50960
_________________________________________________________________
reshape_1 (Reshape) (None, None, 28, 28, 1) 0
=================================================================
Total params: 271,376
Trainable params: 271,376
Non-trainable params: 0
I know my model has problems but I don't know how to correct it.
I guess maybe model.add(Reshape((-1,28,28,1)))
doesn't work properly. To be honest, I didn't know how to deal with the output of model.add(Dense(784, activation='sigmoid'))
. So I put a Reshape layer to make it proper.
Or maybe LSTM
layers cannot detect time correlation correctly, due to my current design.
EDIT 1:
I changed all of Convolution2D activations from sigmoid
to relu
.
here is the result of prediction of the changed model. As it is shown, it's not able to do a reasonable prediction for now.
EDIT 2:
I changed model.add(Reshape((-1, 28,28,1)))
to model.add(TimeDistributed(Reshape((28,28,1))))
and increased LSTM
units to 512
and used 2 layer of LSTMs
. Also used BatchNormalization
and changed input_shape
to (10, 28, 28, 1)
. By using this input shape, I can produce a many to many
model.
But predictions didn't change much. I think I'm ignoring something fundamental. Here is the new model:
# from keras.layers import Dense, Dropout, Activation, LSTM
from keras.layers.normalization import BatchNormalization
from keras.layers import Lambda, Convolution2D, MaxPooling2D, Flatten, Reshape, Conv2D
from keras.layers.convolutional import Conv3D
from keras.models import Sequential
from keras.layers.wrappers import TimeDistributed
from keras.layers.pooling import GlobalAveragePooling1D
from keras.optimizers import SGD
from keras.utils import np_utils
from keras.models import Model
import keras.backend as K
import numpy as np
import pylab as plt
model = Sequential()
model.add(TimeDistributed(Convolution2D(16, (3, 3), activation='relu', kernel_initializer='glorot_uniform', padding='same'), input_shape=(10, 28, 28, 1)))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Convolution2D(32, (3,3), activation='relu')))
model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1))))
model.add(Dropout(0.3))
# extract features and dropout
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.3))
model.add(Dense(784, activation='linear'))
model.add(TimeDistributed(BatchNormalization()))
# input to LSTM
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
model.add(LSTM(units=512, activation='tanh', recurrent_activation='hard_sigmoid', kernel_initializer='glorot_uniform', unit_forget_bias=True, dropout=0.3, recurrent_dropout=0.3, return_sequences=True))
# classifier with sigmoid activation for multilabel
model.add(Dense(784, activation='linear'))
# model.add(TimeDistributed(BatchNormalization()))
model.add(TimeDistributed(Reshape((28,28,1))))
model.compile(loss='mae', optimizer='rmsprop')
print(model.summary())
EDIT 3: Because ConvLSTM2D does exactly the thing that I wanted, and the purpose of writing the question was to understand ConvLSTM2D, I changed the title of the question so that it better demonstrates my problem.