I'm building an LSTM, for a report, and would like to summarize things about it. However, I've seen two different ways to build an LSTM in Keras that yield two different values for the number of parameters.
I'd like to understand why the parameters differ in this way.
This question correctly shows why this code
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()
results in 4457472 parameters.
From what I can tell, the following two LSTMs should be the same
m2 = Sequential()
m2.add(LSTM(1, input_dim=5, input_length=1))
m2.summary()
m3 = Sequential()
m3.add(LSTM((1),batch_input_shape=(None,5,1)))
m3.summary()
However, the m2
results in 28
parameters, but the m3
results in 12
parameters. Why?
How is 12 being calculated for a 1 unit LSTM with a 5-dim input?
Included the warning message. Hope it is helpful.
Output
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 256) 4457472
=================================================================
Total params: 4,457,472
Trainable params: 4,457,472
Non-trainable params: 0
_________________________________________________________________
Warning (from warnings module):
File "difparam.py", line 11
m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: The `input_dim` and `input_length` arguments in recurrent layers are deprecated. Use `input_shape` instead.
Warning (from warnings module):
File "difparam.py", line 11
m2.add(LSTM(1, input_dim=5, input_length=1))
UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(1, input_shape=(1, 5))`
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_2 (LSTM) (None, 1) 28
=================================================================
Total params: 28
Trainable params: 28
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_3 (LSTM) (None, 1) 12
=================================================================
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________
m2 was built based on info from the Stack Overflow question, and m3 was built based on this video from YouTube.