"Unroll" is just a mechanism to process the LSTMs in a way that makes them faster by occupying more memory. (The details are unknown for me... but it certainly has no influence in steps, shapes, etc.)
When you say "2000 points split in 40 time steps", I have absolutely no idea of what is going on.
The data must be meaningfully structured and saying "2000" data points is really lacking a lot of information.
Data structured for LSTMs is:
- I have a certain number of individual sequences (data evolving with time)
- Each sequence has a number of time steps (measures in time)
- In each step we measured a number of different vars with different meanings (features)
Example:
- 2000 users in a website
- They used the site for 40 days
- In each day I measured the number of times they clicked a button
I can plot how this data evolves with time daily (each day is a step)
So, if you have 2000 sequences (also called "samples" in Keras), each sequence with length of 40 steps, and one single feature per step, this will happen:
Dimensions
- Batch size is defined as 32 by default in the
fit
method. The model will process batches containing 32 sequences/users until it reaches 2000 sequences/users.
input_shape
will required to be (40,1)
(free batch size to choose in fit
)
Steps
Your LSTMs will try to understand how clicks vary in time, step by step. That's why they're recurrent, they calculate things for a step and feed these things into the next step, until all 40 steps are processed. (You won't see this processing, though, it's internal)
- With
return_sequences=True
, you will get the output for all steps.
- Without it, you will get only the output for the last step.
The model
The model will process 32 parallel (and independent) sequences/users together in each batch.
- The first
LSTM
layer will process the entire sequence in recurrent steps and return a final result. (The sequence is killed, there are no steps left because you didn't use return_sequences=True
)
- Output shape =
(batch, 100)
- You create a new sequence with
RepeatVector
, but this sequence is constant in time.
- Output shape =
(batch, 40, 100)
- The next
LSTM
layer processes this constant sequence and produces an output sequence, with all 40 steps
- Output shape =
(bathc, 40, 100)
- The
TimeDistributed(Dense)
will process each of these steps, but independently (in parallel), not recursively as the LSTMs would do.
- Output shape =
(batch, 40, n_features)
- The output will be a the total group of 2000 sequences (that were processed in groups of 32), each with 40 steps and
n_features
output features.
Cells, features, units
Everything is independent.
Input features is one thing, output features is another. There is no requirement for Dense
to use the same number of features used in input_shape
, unless that's what you want.
When you use 100 units in the LSTM layer, it will produce an output sequence of 100 features, shape (batch, 40, 100)
. If you use 200 units, it will produce an output sequence with 200 features, shape (batch, 40, 200)
. This is computing power. More neurons = more intelligence in the model.
Something strange in the model:
You should replace:
model.add(LSTM(100, input_shape=(n_timesteps_in, n_features)))
model.add(RepeatVector(n_timesteps_in))
With only:
model.add(LSTM(100, return_sequences=True,input_shape=(n_timesteps_in, n_features)))
Not returning sequences in the first layer and then creating a constant sequence with RepeatVector
is sort of destroying the work of your first LSTM.