Practical advice on dealing with very long inputs using LSTM model?

Question

I built a character-level LSTM model on text data, but ultimately I'm looking to apply this model on very long text documents (such as a novel) where it's important to understand contextual information, such as where in the novel it's in.

For these large-scale NLP tasks, is the data usually cut into smaller pieces and concatenated with metadata - such as position within the document, detected topic, etc. - to be fed into the model? Or are there more elegant techniques?

Possible duplicated of this question https://stackoverflow.com/questions/44478272/how-to-handle-extremely-long-lstm-sequence-length — Bhaskar, Oct 01 '18 at 12:56

score 1 · Answer 1 · answered Oct 01 '18 at 22:04

Personally, I have not gone that in depth with using LSTMs to go into the level of depth that you are trying to attain but I do have some suggestions.

One solution to your problem, which you mentioned above, could be to simply analyze different pieces of the document by splitting your document into smaller pieces and analyzing them that way. You'll probably have to be creative.

Another solution, that I think might be of interest of you is to uses a Tree LSTM model in order to get the level to depth. Here's the link to the paper Using the Tree model you could feed in individual characters or words on the lowest level and then feed it upward to higher levels of abstraction. Again, I am not completely familiar with the model, so don't take my word on it, but it could be a possible solution.

score 0 · Answer 2 · answered Jan 14 '19 at 08:39

Adding few more ideas in answer pointed by bhaskar, which are used to handle this problem.

You can used Attention mechanism, which is used to deal with long term dependencies. Because for a long sequence, it certainly forget information or its next prediction may not depend on all the sequence information, it has in its cell. So attention mechanism helps to find the reasonable weights for the characters, it depend on. For more info you can check this link

There is potentially lots of research on this problem. This is very recent paper on this problem.

You can also break the sequence and use seq2seq model, which encode the features into low dims space and then decoder will extract it . This is short-article on this.

My personal advice is to break the sequence and then train it, because sliding window on the complete sequence is pretty much able to find the correlation between each sequence.

Practical advice on dealing with very long inputs using LSTM model?

2 Answers2