Mutli Step Forecast LSTM model

Question

I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:

X : (5831, 48, 1)
y : (5831, 1, 12)

The model that I am trying to use is:

power_in = Input(shape=(X.shape[1], X.shape[2]))


power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)

main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)

While trying to train the model like this:

hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)

I am getting the following error:

ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)

How to fix this?

you must provide `48` timesteps in the `y`, you have only 1. — Giacomo Alzetta, Sep 19 '18 at 13:46
yeah. I haven't worked with `TimeDistributed` much. How the data must be prepared.? data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast — Sreeram TP, Sep 19 '18 at 13:48

today · Accepted Answer · 2018-09-19T16:18:15.337

1

According to your comment:

[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast

you may not need to use a TimeDistributed layer at all: first, just remove the resturn_sequences=True argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,). Now you can feed it directly to a Dense layer with 12 units:

# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))

power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)

Alternatively, if you would like to use a TimeDistributed layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):

# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))

power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)

model = Model(power_in, main_out)
model.summary()

Model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 48, 1)             0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                10400     
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50)            0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 12, 32)            10624     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1)             33        
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________

Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.

Side note: actually, in your scenario, you don't need to use TimeDistributed layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...)) and Dense(...) are equivalent.

edited Sep 19 '18 at 16:18

answered Sep 19 '18 at 14:16

today

32,602
8
95
115

Great answer. Can you explain how can i extend this solution for mutli variate series.? – Sreeram TP Sep 19 '18 at 16:03
I am trying to forecast next 12 timesteps of a series given last 48 timesteps. Shouldn't I be going with `TimeDistributed` .? – Sreeram TP Sep 19 '18 at 16:04
@SreeramTP For the multi-variate case, in the second model just change the number of units of last Dense layer to the number of features of the timeseries, i.e. if the output shape is `(?, 12, num_feats)`, then we would have `TimeDistributed(Dense(num_feats))(out_lstm)`. And let me tell you something: you don't need that `TimeDistributed` layer at all because (currently) [Dense layer is applied on the last axis], so it is redundant. Just use `Dense(num_feats)(out_lstm)`. – today Sep 19 '18 at 16:12
Thanks for the info. Then which cases TimeDistributed is used.? – Sreeram TP Sep 19 '18 at 16:13
by multivariate I meant multi variate features and forecasting just one variable – Sreeram TP Sep 19 '18 at 16:15
That is a new info you provided in the edit. Thanks. – Sreeram TP Sep 19 '18 at 16:19
while trying to forecast using a model similar to which you have shown that uses Dense, I am getting bad results for the farther timesteps. There is a strong presistence effect. I was trying out different models to solve this. Can you suggest something to try in this situation.? – Sreeram TP Sep 19 '18 at 16:21
@SreeramTP One example for using `TimeDistributed` layer is when the model get multiple frames of a video as input and you want to apply a conv layer on each frame. In that case you would wrap the conv layer inside a `TimeDistributed` layer. As for your other question, the input of the model is `(?, 48, num_feats)`, so what is the output? By one variable you mean the output is `(?, 1)`? – today Sep 19 '18 at 16:21
I meant, The input will be 48 past values for 2 variables and the forecast is supposed to be 12 timesteps of 1 variable – Sreeram TP Sep 19 '18 at 16:23
@SreeramTP Then nothing needs to change except the input shape, i.e. `Input(shape=(48,2))`. – today Sep 19 '18 at 16:25
Cool. Can you suggest something to try to deal with the persistence problem.? – Sreeram TP Sep 19 '18 at 16:31
@SreeramTP Would you clarify what you mean exactly by "persistence effect"? – today Sep 19 '18 at 16:32
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/180369/discussion-between-sreeram-tp-and-today). – Sreeram TP Sep 19 '18 at 16:33
can you take a look at the data prep once again.? While plotting `y[:, 10]` and `y[:, 11]` they are almost the same – Sreeram TP Sep 21 '18 at 10:50
@SreeramTP Of course it must be that way, because `y[:,11]` is equivalent to one-step shift of `y[:,10]`. – today Sep 21 '18 at 10:54
yes, but they are same in many places. there should be a shift every timestamp right.? – Sreeram TP Sep 21 '18 at 11:30
@SreeramTP Are they exactly the same? Yeah, there must be a one-step shift. I have checked the data prep function multiple times. I even checked with another data prep function (the one which is in the notebook I sent you) and the result was the same. – today Sep 21 '18 at 11:33
for most of the timestamps the values are the same – Sreeram TP Sep 21 '18 at 11:53

Mutli Step Forecast LSTM model

1 Answers1