So I've used RNN/LSTMs in three different capacities:
- Many to many: Use every output of the final layer to predict the next. Could be classification or regression.
- Many to one: Use the final hidden state to perform regression or classification.
- One to many: Take a latent space vector, perhaps the final hidden state of an LSTM encoder and use it to generate a sequence (I've done this in the form of an autoencoder).
In none of these cases do I use the intermediate hidden states to generate my final output. Only the last layer outputs in case #1 and only the last layer hidden state in case #2 and #3. However, PyTorch nn.LSTM/RNN
returns a vector containing the final hidden state of every layer, so I assume they have some uses.
I'm wondering what some use cases of those intermediate layer states are?