I read that the internal state of LSTMs flows as follows:
- it is always passed within a batch, so from the last timestamp of the i-th sample to the first of the i+1st
- if the LSTM is stateful then the state is passed between batches, so the memory at the last timestamp of batch_k[i] is passed to the first timestamp of batch_{k+1}[i], for all indices i.
For me, this raises several questions. (Please correct me if my understanding is wrong)
- Does this mean that the first timestamp of the (i+1)st sample needs to be the sucessor of the last timestep of sample i? (for all i)
- Along the same lines, does the first timestamp of the i-th sample in batch k+1 have to be the sucessor of the last timestamp of the i-th sample in batch k?
- If the first two conclusions are correct, then for stateful LSTMs we can NEVER shuffle anything and for the non-stateful ones we can at most shuffle the batches, but not the samples within batches, correct?
- Why do we split the batch in samples of more than one timestep, anyway? If the above is correct, then the procedure 'within a sample' is the same as 'within a batch', so we might as well use samples of one timestep each.