I am assuming that you will need to share the parameters for each array that you stack.
If you were stacking entirely new features, then there wouldn't be an associated target with each one.
If you were stacking completely different examples, then you would not be using 3D arrays, and would just be appending them to the end like normal.
Solution
To solve this problem, I would leverage the TimeDistributed wrapper from Keras.
LSTM layers expect a shape (j, k)
where j
is the number of time steps, and k
is the number of features. Since you want to keep your array as 3D for the input and output, you will want to stack on a different dimension than the feature dimension.
Quick side note:
I think it’s important to note the difference between the approaches. Stacking on the feature dimension gives you multiple features for the same time steps. In that case you would want to use the same LSTM layers and not go this route. Because you want a 3D input and a 3D output, I am proposing that you create a new dimension to stack on which will allow you to apply the same LSTM layers independently.
TimeDistributed:
This wrapper applies a layer to each array at the 1
index.
By stacking your X1
and X2
arrays on the 1
index, and using the TimeDistributed wrapper, you are applying LSTM layers independently to each array that you stack. Notice below that the original and updated model summaries have the exact same number of parameters.
Implementation Steps:
The first step is to reshape the input of (40, 2)
into (2, 40, 1)
. This gives you the equivalent of 2 x (40, 1)
array inputs. You can either do this in the model like I’ve done, or when building your dataset and update the input shape.
- By adding the extra dimension
(..., 1)
to the end, we are keeping the data in a format that the LSTM would understand if it was just looking at one of the arrays that we stacked at a time. Notice how your original input_shape is (40, 1)
for instance.
Then wrap each layer in the TimeDistributed wrapper.
And finally, reshape the y output to match your data by swapping (2, 10)
to (10, 2)
.
Code
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import LSTM, Dense, TimeDistributed, InputLayer, Reshape
from tensorflow.python.keras import backend
import numpy as np
# Original Model
model = Sequential()
model.add(LSTM(12, input_shape=(40, 1), return_sequences=True))
model.add(LSTM(12, return_sequences=True))
model.add(LSTM(6, return_sequences=False))
model.add((Dense(10)))
model.summary()
Original Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 40, 12) 672
_________________________________________________________________
lstm_1 (LSTM) (None, 40, 12) 1200
_________________________________________________________________
lstm_2 (LSTM) (None, 6) 456
_________________________________________________________________
dense (Dense) (None, 10) 70
=================================================================
Total params: 2,398
Trainable params: 2,398
Non-trainable params: 0
_________________________________________________________________
Apply TimeDistributed Wrapper
model = Sequential()
model.add(InputLayer(input_shape=(40, 2)))
model.add(Reshape(target_shape=(2, 40, 1)))
model.add(TimeDistributed(LSTM(12, return_sequences=True)))
model.add(TimeDistributed(LSTM(12, return_sequences=True)))
model.add(TimeDistributed(LSTM(6, return_sequences=False)))
model.add(TimeDistributed(Dense(10)))
model.add(Reshape(target_shape=(10, 2)))
model.summary()
Updated Model Summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
reshape (Reshape) (None, 2, 40, 1) 0
_________________________________________________________________
time_distributed (TimeDistri (None, 2, 40, 12) 672
_________________________________________________________________
time_distributed_1 (TimeDist (None, 2, 40, 12) 1200
_________________________________________________________________
time_distributed_2 (TimeDist (None, 2, 6) 456
_________________________________________________________________
time_distributed_3 (TimeDist (None, 2, 10) 70
_________________________________________________________________
reshape_1 (Reshape) (None, 10, 2) 0
=================================================================
Total params: 2,398
Trainable params: 2,398
Non-trainable params: 0
_________________________________________________________________