2

I was reading through Keras documentation on their site ([https://keras.io/getting-started/faq/]), and I noticed for their definition of batch it says that they run each sample within a batch in parallel. For almost any type of neural network this would be completely acceptable, but if I'm running an RNN with stateful set to the default of False, does this imply that the hidden state is being reset for each and every one of my samples.

I was under the impression that each batch was being run sequentially before an update to the weights was made, and therefore the only loss of hidden states was when the batches changed (since I have stateful set to False).

Am I wrong in my understanding?

a1letterword
  • 307
  • 1
  • 4
  • 16

1 Answers1

1

Every sample is an individual sequence. And a state (the condition in which a sequence is at the current timestep) only makes sense for each sequence individually.

One sequence cannot affect the state of another sequence.

So, there is a parallel state for each sequence in the batch.

In a stateful layer, these parallel states will be kept (the sequences have not ended until you say so).

Here is another related question: When does keras reset an LSTM state?

Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
  • If I don't define a timestep (window) then how would Keras treat that batch? – a1letterword Sep 05 '17 at 19:17
  • I'm not sure about what you call a window. The data must be shaped as `(BatchSize, TimeSteps, FeaturesPerStep)`. The states are updeted by timesteps. – Daniel Möller Sep 05 '17 at 19:31
  • model.add(LSTM(32, input_shape=(588425,26), return_sequences = True)) model.add(Dense(1)) model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy']) filepath="model_check-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] model.fit(df_matrix, y_matrix, epochs=5, batch_size=100000, verbose=2) – a1letterword Sep 05 '17 at 19:34
  • hmm, formatting isn't working there. but basically i put the batchsize in the very end for modelfit, and not earlier when inputting the data to the model itself. it seems to run. The timesteps are the whole data stream which I'm cutting up into 100k intervals – a1letterword Sep 05 '17 at 19:36
  • What is the shape of the input data? If it's `(1, 588425, 26)` you have only one sample. Your batch size will not take effect in this case, because it will always be 1. You have 1 sample with 588425 time steps. – Daniel Möller Sep 05 '17 at 19:40
  • ok i see, so in this case it will run the whole stream in sequence before resetting the state. – a1letterword Sep 05 '17 at 19:46
  • Yes. If you want to separate the timesteps, you will have to follow exactly the example in the page you linked. `batch_size = 1` in the first layer (your batch has only one sequence, `input_shape=(100000,26)` (you will divide the sequences in parts of 100k time steps). --- It seems you will have to train batch by batch manually, as in the example. And I don't know what would happen in the last batch, where you don't have exactly 100k steps. – Daniel Möller Sep 05 '17 at 19:57
  • If I wanted to split it into sequences of 200 timesteps each, then the input size for the first layer would be (500k/200, 200,26), and then if I set batchsize at the bottom to 10,000 it would run 10,000 batches at once before updating weights? – a1letterword Sep 05 '17 at 20:05
  • I don't know. I never tested stateful layers. But it's worth trying, it seems it will work. In the worst case, make a loop using `train_on_batch` just like the example in your link. – Daniel Möller Sep 05 '17 at 21:00