0

I'm building an LSTM, for a report, and would like to summarize things about it. However, I've seen two different ways to build an LSTM in Keras that yield two different values for the number of parameters.

I'd like to understand why the parameters differ in this way.

This question correctly shows why this code

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()

results in 4457472 parameters.

From what I can tell, the following two LSTMs should be the same

m2 = Sequential()
m2.add(LSTM(1, input_dim=5, input_length=1))
m2.summary()

m3 = Sequential()
m3.add(LSTM((1),batch_input_shape=(None,5,1)))
m3.summary()

However, the m2 results in 28 parameters, but the m3 results in 12 parameters. Why?

How is 12 being calculated for a 1 unit LSTM with a 5-dim input?
Included the warning message. Hope it is helpful.

Output

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 256)               4457472   
=================================================================
Total params: 4,457,472
Trainable params: 4,457,472
Non-trainable params: 0
_________________________________________________________________

Warning (from warnings module):
  File "difparam.py", line 11
    m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: The `input_dim` and `input_length` arguments in recurrent layers are deprecated. Use `input_shape` instead.

Warning (from warnings module):
  File "difparam.py", line 11
    m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(1, input_shape=(1, 5))`
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_2 (LSTM)                (None, 1)                 28        
=================================================================
Total params: 28
Trainable params: 28
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_3 (LSTM)                (None, 1)                 12        
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________

m2 was built based on info from the Stack Overflow question, and m3 was built based on this video from YouTube.

VISQL
  • 1,960
  • 5
  • 29
  • 41
  • `m2` seems to say 4 x ((1 x 5) + (1^2)+ 1) = 4 x (1 + 5 + 1) = 4 x 7 = 28. `m3`makes no sense. – VISQL Oct 09 '19 at 11:42

1 Answers1

0

Because the correct values are input_dim = 1 and input_length = 5.


It's even written in the warning you received, where the input shape for m2 is different from the one used in m3:

UserWarning: Update your LSTM call to the Keras 2 API: LSTM(1, input_shape=(1, 5))

It's highly recommended that you use the suggestion in the warning.

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • Okay `m3.add(LSTM((1),batch_input_shape=(None,1,5)))` give the same parameters as `m2.add(LSTM(1, input_dim=5, input_length=1))`. The warning was to cryptic for me. – VISQL Oct 09 '19 at 15:48
  • This is bad. You don't gain anything from an LSTM with length = 1. – Daniel Möller Oct 09 '19 at 15:55
  • Right. The points was to understand the param discrepancy. `input_length` is actually optional. In `m3`, the `None` actually represents the input length (i.e. # of observations). It's not a value relevant to parameters. I'm trying to understand param counts beforehand as bad way of comparing to other NN types. – VISQL Oct 10 '19 at 07:11
  • No, in `batch_input_shape=(None, 1, 5)` the length is 1 -> `batch_input_shape = (samples, length, features)`. – Daniel Möller Oct 10 '19 at 12:19