14

I have some data that is sampled at at a very high rate (on the order of hundreds of times per second). This results in a sequence length that is huge (~90,000 samples) on average for any given instance. This entire sequence has a single label. I am trying to use an LSTM neural network to classify new sequences as one of these labels (multiclass classification).

However, using an LSTM with a such a large sequence length results in a network that is quite large.

What are some methods to effectively 'chunk' these sequences so that I could reduce the sequence length of the neural network, yet still maintain the information captured in the entire instance?

user
  • 199
  • 1
  • 1
  • 12

3 Answers3

12

When you have very long sequences RNNs can face the problem of vanishing gradients and exploding gradients.

There are methods. The first thing you need to understand is why we need to try above methods? It's because back propagation through time can get real hard due to above mentioned problems.

Yes introduction of LSTM has reduced this by very large margin but still when it's is so long you can face such problems.

So one way is clipping the gradients. That means you set an upper bound to gradients. Refer to this stackoverflow question

Then this problem you asked

What are some methods to effectively 'chunk' these sequences?

One way is truncated back propagation through time. There are number of ways to implement this truncated BPTT. Simple idea is

  1. Calculate the gradients only for number of given time steps That means if your sequence is 200 time steps and you only give 10 time steps it will only calculate gradient for 10 time step and then pass the stored memory value in that 10 time step to next sequence(as the initial cell state) . This method is what tensorflow using to calculate truncated BPTT.

2.Take the full sequence and only back propagate gradients for some given time steps from selected time block. It's a continuous way

Here is the best article I found which explains these trunacated BPTT methods. Very easy. Refer to this Styles of Truncated Backpropagation

Shamane Siriwardhana
  • 3,951
  • 6
  • 33
  • 73
  • Thanks! With these methods, would I still build the network with ~90,000 LSTM layers? That's where my problem lies at the moment, since I don't believe this would be trainable in a reasonable amount of time. – user Jun 14 '17 at 19:44
  • Well theoretically yes! you can. But practically it won't be easy. One thing is the training time. The other problem is when you have so many long connections they will tend to loose memory with time steps. – Shamane Siriwardhana Jun 15 '17 at 04:04
  • I think your answer is valid to deal with general RNN problems. However, my problem in this question was simply concerning a network that, when unrolled, would be very large and take a very long time to train. Instead, I've broken down my sequences into smaller subsequences with individual labels. – user Oct 31 '17 at 21:36
  • 1
    BPTT link is dead. Could you post an update or reference it please? – Gulzar Mar 31 '21 at 17:55
7

This post is from some time ago, but I thought I would chime in here. For this specific problem that you are working on (one-dimensional continuous-valued signal with locality, composition-ality, and stationarity), I would highly recommend a CNN convolutional neural network approach, as opposed to using an LSTM.

xgaox
  • 71
  • 1
  • 2
  • Thanks @xgaox. This post was indeed from some time ago, and nowadays I would also approach the problem with CNNs. However, that alone would not address the issue of a very long sequence. – user Apr 06 '20 at 22:57
  • @ShanteshwarInde It would also be great if you specified the problem you see with this answer while bothering to post a link to the guidelines. – Gulzar Mar 31 '21 at 17:59
3

Three years later, we have what seems to be the start of solutions for this type of problem: sparse transformers.

See

https://arxiv.org/abs/1904.10509

https://openai.com/blog/sparse-transformer/

user
  • 199
  • 1
  • 1
  • 12