How to feed a LSTM net by a (2000,7,7,512) shape of tensor in Keras?

Question

input data (X) shape is (2000, 7, 7, 512)

the net is

visible = Input(shape=(7,7,512))
Lstm = LSTM(units=22, return_sequences=True)(visible)
Dense_1 = Dense(4096)(Lstm)
Dense_2 = Dense(512 ,activation='sigmoid')(Dense_1)
Dense_3 = Dense(5, activation='sigmoid')(Dense_2)
model = Model(input = visible, output=Dense_3)

And the error is: ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4

What should be input_shape for lstm and other layers?

Lstm input shape should be 3D only. You need to reshape the 4d tensor first. check this : https://stackoverflow.com/questions/52936132/4d-input-in-lstm-layer-in-keras — Harshit Mehta, Jan 24 '19 at 20:04

score 1 · Answer 1 · answered Jan 24 '19 at 20:07

1

From Keras's RNN documentation:

Your input needs to be a 3D tensor with shape (batch_size, timesteps, input_dim).

Your input is a 4D tensor. Whichever dimension in your input that represents the number of timesteps should be the first dimension of the input.

Your output shape, with return_sequences, will be a 3D tensor with shape (batch_size, timesteps, units).

answered Jan 24 '19 at 20:07

Luke DeLuccia

541
6
16

Good catch, I guess 2000 is batch_size. – prosti Jan 24 '19 at 21:46
Since output of VGG16 is 7x7x512, then batch_size is "7x7x512 = 25088". Timesteps is depended on you! If you wish an LSTM produces an output vector for each image, then Timesteps is 1. Thus input size becomes 2000x1x25088. But inside LSTM, you do not use the number images but only Timesteps and batch_size as input_shape. Because the number of images is whatever, each image has a shape of (1x25088). – Aug 16 '20 at 12:47
Suppose that you have 20 videos. Each video has 15 frames (images), and each image has a shape of 7x7x512. That means your input tensor has a shape of 20x15x7x7x512, so 20x15x25088. But each video shape is 15x25088. That's why inside code you write "visible = Input(shape=(15,25088))". If return_sequences=True, that means LSTM will output a vector for each frame. If it is False, LSTM will output a vector for every 15 frames. – Aug 16 '20 at 12:47

prosti · Accepted Answer · 2019-01-24T21:53:44.073

The LSTM input layer must be 3D with the dimensions:

samples,
time steps and,
features

Try it like this:

from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
import numpy as np

# define model
X = np.random.rand(2000, 7, 7, 512)
X = X.reshape(2000, 49, 512)

visible = Input(shape=(49,512))
Lstm = LSTM(units=22, return_sequences=True)(visible)
Dense_1 = Dense(4096)(Lstm)
Dense_2 = Dense(512 ,activation='sigmoid')(Dense_1)
Dense_3 = Dense(5, activation='sigmoid')(Dense_2)
model = Model(input = visible, output=Dense_3)

The LSTM input layer is defined by the shape argument on the first hidden layer.

It takes a tuple of two values that define the number of time steps and features.

The number of samples is assumed to be 1 or more, in here I think 2000 is the number of samples.

How to feed a LSTM net by a (2000,7,7,512) shape of tensor in Keras?

2 Answers2