2

I am trying to boost the performance of a object detection task with sequential information, using ConvLSTM.

A typical ConvLSTM model takes a 5D tensor with shape (samples, time_steps, channels, rows, cols) as input.

as stated in this post, a long sequence of 500 images need to be split into smaller fragments in the Pytorch ConvLSTM layer. For example, it could be split into 10 fragements with each having 50 time steps.


I have two goals:

  1. I want the network to remember the state across the 10 fragment sequences. I.e. how to pass the hidden state between the fragements?

  2. I want to feed in the images (of the video) one by one. I.e. the long sequence of 500 images is split into 500 fragments with each one having only one image. The input should be like (all_samples, channels, rows, cols). This only make sense if the 1.goal could be achieved.


I found some good answers for Tensorflow, but I am using Pytorch.

TensorFlow: Remember LSTM state for next batch (stateful LSTM)

The best way to pass the LSTM state between batches

What is the best way to implement stateful LSTM/ConvLSTM in Pytorch?

zheyuanWang
  • 1,158
  • 2
  • 16
  • 30
  • Why don't you just do a 3d convolution? – iacob Apr 07 '21 at 09:08
  • I think saving only the current image & hidden state may be more efficient than saving a bunch of images. So instead of concatenating them before feeding them into convLSTM or 3d convolution, I want to feed the images one by one. – zheyuanWang Apr 07 '21 at 09:33

1 Answers1

2

I found this post has a good example

model = nn.LSTM(input_size = 20, hidden_size = h_size)
out1, (h1,c1) = model(x1)
out2, (h2,c2) = model(x2, (h1,c1))
zheyuanWang
  • 1,158
  • 2
  • 16
  • 30