1

I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:

X : (5831, 48, 1)
y : (5831, 1, 12)

The model that I am trying to use is:

power_in = Input(shape=(X.shape[1], X.shape[2]))


power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)

main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)

While trying to train the model like this:

hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)

I am getting the following error:

ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)

How to fix this?

today
  • 32,602
  • 8
  • 95
  • 115
Sreeram TP
  • 11,346
  • 7
  • 54
  • 108
  • you must provide `48` timesteps in the `y`, you have only 1. – Giacomo Alzetta Sep 19 '18 at 13:46
  • yeah. I haven't worked with `TimeDistributed` much. How the data must be prepared.? data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast – Sreeram TP Sep 19 '18 at 13:48

1 Answers1

1

According to your comment:

[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and t+1, t+2, ......, t+12 as the values that I want to forecast

you may not need to use a TimeDistributed layer at all: first, just remove the resturn_sequences=True argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,). Now you can feed it directly to a Dense layer with 12 units:

# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))

power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)

Alternatively, if you would like to use a TimeDistributed layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):

# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))

power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
                  dropout=0.412563,
                  kernel_initializer=power_lstm_init)(power_in)

rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)

model = Model(power_in, main_out)
model.summary()

Model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         (None, 48, 1)             0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 50)                10400     
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50)            0         
_________________________________________________________________
lstm_4 (LSTM)                (None, 12, 32)            10624     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1)             33        
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________

Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.


Side note: actually, in your scenario, you don't need to use TimeDistributed layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...)) and Dense(...) are equivalent.

today
  • 32,602
  • 8
  • 95
  • 115
  • Great answer. Can you explain how can i extend this solution for mutli variate series.? – Sreeram TP Sep 19 '18 at 16:03
  • I am trying to forecast next 12 timesteps of a series given last 48 timesteps. Shouldn't I be going with `TimeDistributed` .? – Sreeram TP Sep 19 '18 at 16:04
  • @SreeramTP For the multi-variate case, in the second model just change the number of units of last Dense layer to the number of features of the timeseries, i.e. if the output shape is `(?, 12, num_feats)`, then we would have `TimeDistributed(Dense(num_feats))(out_lstm)`. And let me tell you something: you don't need that `TimeDistributed` layer at all because (currently) [Dense layer is applied on the last axis], so it is redundant. Just use `Dense(num_feats)(out_lstm)`. – today Sep 19 '18 at 16:12
  • Thanks for the info. Then which cases TimeDistributed is used.? – Sreeram TP Sep 19 '18 at 16:13
  • by multivariate I meant multi variate features and forecasting just one variable – Sreeram TP Sep 19 '18 at 16:15
  • That is a new info you provided in the edit. Thanks. – Sreeram TP Sep 19 '18 at 16:19
  • while trying to forecast using a model similar to which you have shown that uses Dense, I am getting bad results for the farther timesteps. There is a strong presistence effect. I was trying out different models to solve this. Can you suggest something to try in this situation.? – Sreeram TP Sep 19 '18 at 16:21
  • @SreeramTP One example for using `TimeDistributed` layer is when the model get multiple frames of a video as input and you want to apply a conv layer on each frame. In that case you would wrap the conv layer inside a `TimeDistributed` layer. As for your other question, the input of the model is `(?, 48, num_feats)`, so what is the output? By one variable you mean the output is `(?, 1)`? – today Sep 19 '18 at 16:21
  • I meant, The input will be 48 past values for 2 variables and the forecast is supposed to be 12 timesteps of 1 variable – Sreeram TP Sep 19 '18 at 16:23
  • @SreeramTP Then nothing needs to change except the input shape, i.e. `Input(shape=(48,2))`. – today Sep 19 '18 at 16:25
  • Cool. Can you suggest something to try to deal with the persistence problem.? – Sreeram TP Sep 19 '18 at 16:31
  • @SreeramTP Would you clarify what you mean exactly by "persistence effect"? – today Sep 19 '18 at 16:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/180369/discussion-between-sreeram-tp-and-today). – Sreeram TP Sep 19 '18 at 16:33
  • can you take a look at the data prep once again.? While plotting `y[:, 10]` and `y[:, 11]` they are almost the same – Sreeram TP Sep 21 '18 at 10:50
  • @SreeramTP Of course it must be that way, because `y[:,11]` is equivalent to one-step shift of `y[:,10]`. – today Sep 21 '18 at 10:54
  • yes, but they are same in many places. there should be a shift every timestamp right.? – Sreeram TP Sep 21 '18 at 11:30
  • @SreeramTP Are they exactly the same? Yeah, there must be a one-step shift. I have checked the data prep function multiple times. I even checked with another data prep function (the one which is in the notebook I sent you) and the result was the same. – today Sep 21 '18 at 11:33
  • for most of the timestamps the values are the same – Sreeram TP Sep 21 '18 at 11:53