1

Perhaps a question better posed to Computer Science or Cross Validated?


I'm beginning some work with LSTM on sequences of arbitrary length and one problem I'm experiencing and that I haven't seen addressed, is that my network seems to have developed a couple parameters that grow linearly (perhaps as a measure of time?).

The obvious issue with this is that the training data is bounded at a sequence of length x and so the network grows this parameter reasonably up until tilmestep x. But after that, the network will eventually NAN because values are getting too extreme.

Has anyone read anything about the normalization of stabilization of states over time?

Any suggestions would be much appreciated.

Aidan Gomez
  • 8,167
  • 5
  • 28
  • 51

1 Answers1

0

Idea #1: Gradient clipping is often applied in RNNs. Here is an example of implementation: How to effectively apply gradient clipping in tensor flow?

Idea #2: Using Recurrent Batch Normalization (arXiv) (Batch Normalization)

Here is a Tensorflow implementation of a batch normalized LSTM cell: https://github.com/OlavHN/bnlstm/blob/master/lstm.py

This implementation is explained in the article here : Batch normalized LSTM for Tensorflow

Community
  • 1
  • 1
Jules G.M.
  • 3,624
  • 1
  • 21
  • 35