Understanding apparently different Keras LSTM API calls

Question

From the official documentation i don't see any arguements like batch_size and input_shape for LSTM. However i have seen declarations like these model.add(LSTM(batch_size, input_shape=(time_steps, features))) in this medium article as well as this SO post which uses model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True)).

Q1) Can someone please elucidate how is that possible?
Q2) Another thing i am not able to understand is the concept of batch_size especially for text analysis. If i want my model to learn from sentences then should i use a batch_size of 1 referring to 1 sentence sample? In general the arguments passed to LSTMs don't seem definitive. Is there a proper guide to using LSTMs correctly?

mic · Accepted Answer · 2018-10-03T10:21:02.883

In your question, you mentioned the use of

model.add(LSTM(batch_size, input_shape=(time_steps, features)))

I think the author mislead you as batch_size is the incorrect term to be used here. As you can see in the Keras API spec, the first parameter defines the number of hidden states/ units in this 'layer'. I looked at the medium post you linked and I believe what likely happened is that the batch_size was equal to the number of units, and the author was lazy/ uniformed and decided to use the same constant for both. There has been no change to Keras since May of this year (when the post was written) that can explain their mistake.

As for the SO post, batch_input_shape is only applicable for stateful LSTM layers. From the documentation:

You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch. This assumes a one-to-one mapping between samples in different successive batches.

To enable statefulness: - specify stateful=True in the layer constructor. - specify a fixed batch size for your model, by passing if sequential model: batch_input_shape=(...) to the first layer in your model. else for functional model with 1 or more Input layers: batch_shape=(...) to all the first layers in your model. This is the expected shape of your inputs including the batch size. It should be a tuple of integers, e.g. (32, 10, 100). - specify shuffle=False when calling fit().

As for your question regarding the nature of batch_size in text analysis, it simply refers to the number of samples to propagate through the network. Therefore, if you wanted to pass only a single sentence at a time, you could set it to 1. The problem with this is that your gradient estimation will be less accurate. If you can (memory constraints come into play here) you should use a larger batch_size.

Probably a noob question, but according to the documentation you quoted if there isn't a one to one mapping between samples in different successive batches, then using a larger batch size won't help me, right? To give you a better idea, in my dataset, the rows correspond to one sample and each sample can be 1 or a few sentences long. However since i am vectorizing it using word2vec, and padding later so that eventually all vectors become equal length, the batch size should still be left as 1 or the length of the vector, maybe like 5 or 9? — jar, Oct 08 '18 at 07:27
Or would the ```units``` in this case be 1 or equal to the length of the vectors? — jar, Oct 08 '18 at 07:33

Understanding apparently different Keras LSTM API calls

1 Answers1