Stacking up of LSTM outputs in pytorch

Question

I was going through some tutorial about the sentiment analysis using lstm network. The below code said that its stacks up the lstm output. I Don't know how it works.

lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)

Are you talking about [udacity's tutorial code](https://github.com/udacity/deep-learning-v2-pytorch/blob/master/sentiment-rnn/Sentiment_RNN_Solution.ipynb)? The mentioned comment is a bit misleading. I think what it means is to convert the output of lstm layer (`lstm_out`) into a single vector. — kHarshit, Feb 18 '19 at 15:30
@kHarshit yes I am talking about the udacity one and The comment in the code was stack up of lstm outputs. — Koushik J, Feb 18 '19 at 18:06
The solution comment and network architecture graph misled me to believe stacking means adding another LSTM layer! After reading the documentation, I finally see the n_layers argument is passed in to define how many layers of LSTM are stacked. — Darkato, Feb 18 '21 at 22:29

score 5 · Accepted Answer · answered May 01 '19 at 09:08

It indeed stacks the output, the comment by kHarshit is misleading here!

To visualize this, let us review the output of the previous line in the tutorial (accessed May 1st, 2019):

lstm_out, hidden = self.lstm(embeds, hidden)

The output dimension of this will be [sequence_length, batch_size, hidden_size*2], as per the documentation. Here, the length of twice the input comes from having a bidirectional LSTM. Therefore, your first half of the last dimension will always be the forward output, and then afterwards the backwards output (I'm not entirely sure on the direction of that, but it seems to me that it is already in the right direction).

Then, the actual line that you are concerned about:

We're ignoring the specifics of .contiguous() here, but you can read up on it in this excellent answer on Stackoverflow. In summary, it basically makes sure that your torch.Tensor is in the right alignment in memory.
Lastly, .view() allows you to reshape a resulting tensor in a specific way. Here, we're aiming for a shape that has two dimensions (as defined by the number of input arguments to .view(). Specifically, the second dimension is supposedly having the size hidden_dim. -1 for the first dimension simply means that we're redistributing the vector dimension in such a way that we don't care about the exact dimension, but simply satisfy the other dimension's requirements.
So, if you have a vector of, say, length 40, and want to reshape that one into a 2D-Tensor of (-1, 10), then the resulting tensor would have shape (4, 10).

As we've previously said that the first half of the vector (length hidden_dim) is the forward output, and the latter half is the second half, then the resulting split into a tensor of (-1, hidden_dim) will be resulting in a tensor of (2, hidden_dim), where the first row contains the forward output, "stacked" on top of the second row, which equals the reverse layer's output.

Visual example:

lstm_out, hidden = self.lstm(embeds, hidden)
print(lstm_out) # imagine a sample output like [1,0 , 2,0] 
                #                      forward out  | backward out

stacked = lstm_out.contiguous().view(-1,hidden_dim) # hidden_dim = 2

print(stacked) # torch.Tensor([[1,0],
               #               [2,0]])

Stacking up of LSTM outputs in pytorch

1 Answers1