10

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

On a simple example, I couldn't observe any difference between:

model = Sequential()
model.add(LSTM(1, batch_input_shape=(None,5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

and

model = Sequential()
model.add(LSTM(1, input_shape=(5,1), return_sequences=True))
model.add(LSTM(1, return_sequences=False))

However when I set the batch size to 12 batch_input_shape=(12,5,1) and used batch_size=10 when fitting the model, I got an error.

ValueError: Cannot feed value of shape (10, 5, 1) for Tensor 'lstm_96_input:0', which has shape '(12, 5, 1)'

Which obviously makes sense. However I can see no point in restricting the batch size on model level.

Am I missing something?

Maxim
  • 52,561
  • 27
  • 155
  • 209
Andrzej Gis
  • 13,706
  • 14
  • 86
  • 130

1 Answers1

11

Is it just a different way of setting the same thing or do they actually have different meanings? Does it have anything to do with network configuration?

Yes, they are practically equivalent, your experiments confirm it, see also this discussion.

However I can see no point in restricting the batch size on model level.

Batch size restriction is sometimes necessary, the example that comes to my mind is a stateful LSTM, in which the last cell state in a batch is remembered and used for initialization for subsequent batches. This ensures the client won't feed different batch sizes into the network. Example code:

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
Maxim
  • 52,561
  • 27
  • 155
  • 209