I have a more theoretical question i wasn't able to find an answer to. Lets say i have as Input a list of numbers:
input = [0,5,0,4,3,3,2,1]
And lets say as the first hidden layer consists of 3 LSTM nodes. How is now the list presented to the LSTM (with timesteps=8)?
My first idea is:
Input timestep 1:
node 1 = 0, node 2 = 0, node 3 = 0
Input timestep 2:
node 1 = 5, node 2 = 5, node 3 = 5
...
so each node sees the same input in every timestep.
My second idea is:
Input timestep 1:
node 1 = 0, node 2 = 5, node 3 = 0
Input timestep 2:
node 1 = 5, node 2 = 0, node 3 = 4
...
Input timestep 8:
node 1 = 1, node 2 = -, node 3 = -
in each timestep each node gets a different input, the input is like a sliding window moving from left to right over the list. In this case every element from the list (every number) is presented unequally often to the LSTM.
My last idea is:
Input timestep 1:
node 1 = 0, node 2 = 5, node 3 = 0
next timestep:
node 1 = 4, node 2 = 3, node 3 = 3
last timestep:
node 1 = 2, node 2 = 1, node 3 = -
so again each node gets a different input but this time the window doesn’t slide over the list it rather jumps. In this case each number is only one time presented to the LSTM.
I would guess that the first idea is how it works, but i don't know. Or is it completely different?