In your question, you mentioned the use of
model.add(LSTM(batch_size, input_shape=(time_steps, features)))
I think the author mislead you as batch_size
is the incorrect term to be used here. As you can see in the Keras API spec, the first parameter defines the number of hidden states/ units in this 'layer'. I looked at the medium post you linked and I believe what likely happened is that the batch_size
was equal to the number of units, and the author was lazy/ uniformed and decided to use the same constant for both. There has been no change to Keras since May of this year (when the post was written) that can explain their mistake.
As for the SO post, batch_input_shape
is only applicable for stateful LSTM layers. From the documentation:
You can set RNN layers to be 'stateful', which means that the states
computed for the samples in one batch will be reused as initial states
for the samples in the next batch. This assumes a one-to-one mapping
between samples in different successive batches.
To enable statefulness: - specify stateful=True
in the layer
constructor. - specify a fixed batch size for your model, by passing
if sequential model: batch_input_shape=(...)
to the first layer in
your model. else for functional model with 1 or more Input layers:
batch_shape=(...)
to all the first layers in your model. This is the
expected shape of your inputs including the batch size. It should be a
tuple of integers, e.g. (32, 10, 100)
. - specify shuffle=False
when
calling fit()
.
As for your question regarding the nature of batch_size
in text analysis, it simply refers to the number of samples to propagate through the network. Therefore, if you wanted to pass only a single sentence at a time, you could set it to 1. The problem with this is that your gradient estimation will be less accurate. If you can (memory constraints come into play here) you should use a larger batch_size
.