Keras LSTM: first argument

Question

In Keras, if you want to add an LSTM layer with 10 units, you use model.add(LSTM(10)). I've heard that number 10 referred to as the number of hidden units here and as the number of output units (line 863 of the Keras code here).

My question is, are those two things the same? Is the dimensionality of the output the same as the number of hidden units? I've read a few tutorials (like this one and this one), but none of them state this explicitly.

score 0 · Answer 1 · answered Apr 10 '17 at 16:12

To get a good intuition for why this makes sense. Remember that the LSTM job is to encode a sequence into a vector (maybe a Gross oversimplification but its all we need). The size of that vector is specified by hidden_units, the output is:

   seq vector            RNN weights
(1 X input_dim) * (input_dim X hidden_units),

which has 1 X hidden_units (a row vector representing the encoding of your input sequence). And thus, the names in this case are used synonymously.

Of course RNNs require more than one multiplication and keras implements RNNs as a sequence of matrix-matrix multiplications instead vector-matrix shown above.

This seems to be implying that the number of hidden units is the number of output units, but I think those are two separate things - an LSTM can encode a sequence to another sequence, as you say, but it can also go from a sequence to a real number, can't it? So the hidden representation is not necessarily the same as the output. — StatsSorceress, May 26 '17 at 17:52

score 0 · Answer 2 · answered Dec 15 '17 at 12:26

0

The number of hidden units is not the same as the number of output units.

The number 10 controls the dimension of the output hidden state (source code for the LSTM constructor method can be found here. 10 specifies the units argument). In one of the tutorial's you have linked to (colah's blog), the units argument would control the dimension of the vectors h_t-1 , h_t, and h_t+1: RNN image.

If you want to control the number of LSTM blocks in your network, you need to specify this as an input into the LSTM layer. The input shape to the layer is (nb_samples, timesteps, input_dim) Keras documentation. timesteps controls how many LSTM blocks your network contains. Referring to the tutorial on colah's blog again, in RNN image, timesteps would control how many green blocks the network contains.

answered Dec 15 '17 at 12:26

Sam - Founder of AceAINow.com

374
1
4
8

For LSTMs, h is also the output, so the hidden units would be the same as the number of output units. – nuric May 12 '18 at 16:46
That is not always the case. Whilst an LSTM can encode a sequence to another sequence, one can also use an LSTM to encode a sequence to a single classification output. For example, you could use an LSTM to represent a sentence, and the output would represent the sentiment for that particular sentence. – Sam - Founder of AceAINow.com May 19 '18 at 07:27
Well yes, and that final output will be the vector h from the diagram of the final LSTM cell. So, h in the equation will be the output again. – nuric May 19 '18 at 09:47
The output of a single LSTM cell is a vector whose dimension is the size of the hidden state. However, if you have more than a single LSTM cell, as in the case when one encodes a sequence to another sequence, the output of the network is no longer a single vector. In Keras, the output can be for example a 3 dimensional tensor, (batch_size, timesteps, units), where units is the parameter the question is considering. So, are we considering the dimensionality of the output of a single LSTM cell, or the dimensionality of the output of the network? – Sam - Founder of AceAINow.com Jun 02 '18 at 09:00

score 0 · Accepted Answer · answered May 12 '18 at 16:45

The answers seems to refer to multi-layer perceptrons (MLP) in which the hidden layer can be of different size and often is. For LSTMs, the hidden dimension is the same as the output dimension by construction:

The h is the output for a given timestep and the cell state c is bound by the hidden size due to element wise multiplication. The addition of terms to compute the gates would require that both the input kernel W and the recurrent kernel U map to the same dimension. This is certainly the case for Keras LSTM as well and is why you only provide single units argument.

Keras LSTM: first argument

3 Answers3