How to deal with multi step time series forecasting in multivariate LSTM in keras

Question

I am trying to do multi-step time series forecasting using multivariate LSTM in Keras. Specifically, I have two variables (var1 and var2) for each time step originally. Having followed the online tutorial here, I decided to use data at time (t-2) and (t-1) to predict the value of var2 at time step t. As sample data table shows, I am using the first 4 columns as input, Y as output. The code I have developed can be seen here, but I have got three questions.

   var1(t-2)  var2(t-2)  var1(t-1)  var2(t-1)  var2(t)
2        1.5       -0.8        0.9       -0.5     -0.2
3        0.9       -0.5       -0.1       -0.2      0.2
4       -0.1       -0.2       -0.3        0.2      0.4
5       -0.3        0.2       -0.7        0.4      0.6
6       -0.7        0.4        0.2        0.6      0.7

Q1: I have trained an LSTM model with the data above. This model does well in predicting the value of var2 at time step t. However, what if I want to predict var2 at time step t+1. I feel it is hard because the model cannot tell me the value of var1 at time step t. If I want to do it, how should I modify the code to build the model?
Q2: I have seen this question asked a lot, but I am still confused. In my example, what should be the correct time step in [samples, time steps, features] 1 or 2?
Q3: I just started studying LSTMs. I have read here that one of the biggest advantages of LSTM is that it learns the temporal dependence/sliding window size by itself, then why must we always covert time series data into format like the table above?

Update: LSTM result (blue line is the training seq, orange line is the ground truth, green is the prediction)

Are var1 and var2 independent from each other? Do you want to predict only var 2? Don't you want to predict var 1 as well? — Daniel Möller, Oct 24 '17 at 13:03
They are independent. Just think of them as precipitation and soil moisture. Yes, I only want to predict var1. — Yongyao Jiang, Oct 24 '17 at 14:48
Soil moisture is not independent from precipitation... do you have a complete sequence of precipitation values to input? — Daniel Möller, Oct 24 '17 at 15:05
Yeah, I know there is some correlation, maybe a bad example. Just wanted to simplify the case. There was a typo in my previous comment, I only want to predict var2. And yes, I have a complete sequence of monthly data here: https://github.com/Yongyao/enso-forcasting/blob/master/preprocessed/indice_olr_excluded.csv — Yongyao Jiang, Oct 24 '17 at 15:12
But var 2 depends on var 1, right? (If so, you have to predict var 1 too). — Daniel Möller, Oct 24 '17 at 15:26
So I should predict var1 explicitly by adding it to the output (2 nodes). And use those two values to roll forward? Is there better way to do this, because I thought LSTM can handle this kind of problem implicitly. Also, do you have any thoughts on Q2 above? — Yongyao Jiang, Oct 24 '17 at 15:47

score 10 · Accepted Answer · edited Jun 20 '20 at 09:12

Question 1:

From your table, I see you have a sliding window over a single sequence, making many smaller sequences with 2 steps.

For predicting t, you take first line of your table as input
For predicting t+1, you take the second line as input.

If you're not using the table: see question 3

Question 2:

Assuming you're using that table as input, where it's clearly a sliding window case taking two time steps as input, your timeSteps is 2.

You should probably work as if var1 and var2 were features in the same sequence:

input_shape = (2,2) - Two time steps and two features/vars.

Question 3:

We do not need to make tables like that or build a sliding window case. That is one possible approach.

Your model is actually capable of learning things and deciding the size of this window itself.

If on one hand your model is capable of learning long time dependencies, allowing you not to use windows, on the other hand, it may learn to identify different behaviors at the beginning and at the middle of a sequence. In this case, if you want to predict using sequences that start from the middle (not including the beginning), your model may work as if it were the beginning and predict a different behavior. Using windows eliminate this very long influence. Which is better may depend on testing, I guess.

Not using windows:

If your data has 800 steps, feed all the 800 steps at once for training.

Here, we will need to separate two models, one for training, another for predicting. In training, we will take advantage of the parameter return_sequences=True. This means that for each input step, we will get an output step.

For predicting later, we will want only one output, then we will use return_sequences= False. And in case we are going to use the predicted outputs as inputs for following steps, we are going to use a stateful=True layer.

Training:

Have your input data shaped as (1, 799, 2), 1 sequence, taking the steps from 1 to 799. Both vars in the same sequence (2 features).

Have your target data (Y) shaped also as (1, 799, 2), taking the same steps shifted, from 2 to 800.

Build a model with return_sequences=True. You may use timeSteps=799, but you may also use None (allowing variable amount of steps).

model.add(LSTM(units, input_shape=(None,2), return_sequences=True))
model.add(LSTM(2, return_sequences=True)) #it could be a Dense 2 too....
....
model.fit(X, Y, ....)

Predicting:

For predicting, create a similar model, now with return_sequences=False.

Copy the weights:

newModel.set_weights(model.get_weights())

You can make an input with length 800, for instance (shape: (1,800,2)) and predict just the next step:

step801 = newModel.predict(X)

If you want to predict more, we are going to use the stateful=True layers. Use the same model again, now with return_sequences=False (only in the last LSTM, the others keep True) and stateful=True (all of them). Change the input_shape by batch_input_shape=(1,None,2).

#with stateful=True, your model will never think that the sequence ended  
#each new batch will be seen as new steps instead of new sequences
#because of this, we need to call this when we want a sequence starting from zero:
statefulModel.reset_states()

#predicting
X = steps1to800 #input
step801 = statefulModel.predict(X).reshape(1,1,2)
step802 = statefulModel.predict(step801).reshape(1,1,2)
step803 = statefulModel.predict(step802).reshape(1,1,2)
    #the reshape is because return_sequences=True eliminates the step dimension

Actually, you could do everything with a single stateful=True and return_sequences=True model, taking care of two things:

When training, reset_states() for every epoch. (Train with a manual loop and epochs=1)
When predicting from more than one step, take only the last step of the output as the desired result.

Thanks! This helps a lot. (1) For Q1 and Q2, if I use sliding window and in this case the input_shape = (2,2), does that mean I am telling LSTM that t step is only related to the previous two steps - t-1 and t-2, which is known as the classical sliding window effect? (2) If I take your last suggestion of training with a manual loop, can I just call model.fit() repeatedly? — Yongyao Jiang, Oct 26 '17 at 16:45
Yes... if using a sliding window with 2 steps like that, your LSTM will only be able to learn 2 steps and nothing else. --- In the last suggestion, yes, `model.fit(longSequence,longPrediction, epochs=1)`, but it would be better to use `model.train_on_batch()`, don't forget to reset the states for every loop. — Daniel Möller, Oct 26 '17 at 16:59
I just started using LSTM. If the memory is stilled determined by the window size, that means I cannot have both long and short memory at the same time, but LSTM is short for long short-term memory, isn't it weird? — Yongyao Jiang, Oct 26 '17 at 17:42
.... wait.... what?? Hahaha.... I don't like the sliding window case... I hardly ever use it. I like the approaches like Q3. They do exploit the LSTM capabilities. — Daniel Möller, Oct 26 '17 at 17:46
Just tried what you suggested, 1) it turns out input_shape=(None,2) is not supported in Keras. Some people say variable input is only supported within TensorFlow. 2) another thing is that, if I understand correctly, stateful=True don't affect the prediction (each new prediction would not be seen as new steps), right? — Yongyao Jiang, Oct 27 '17 at 14:34
Everything I wrote in the answer was tested and works. Keras does support `(None,2)` and variable lengths. What is your keras version? — Daniel Möller, Oct 27 '17 at 15:15
In `stateful=False`, every "batch" (including calling `fit` or `predict` again) is considered to be "a whole new input sequence". On the other side, `stateful=True` sees every "batch" as "new input steps for the batch that was input before", until you manually "reset_states()`. — Daniel Möller, Oct 27 '17 at 15:46
In `stateful=True`, you must pass `batch_input_shape=(1,None,2)`. While in `stateful=False`, you can pass `input_shape=(None,2)`. — Daniel Möller, Oct 27 '17 at 15:47
Just figured it out, it was because I didn't set the dimension of the Y dimension right, but there are still 2 problems right now:1) it only supports batch_size = 1, which makes the training really slow; 2) the result is really bad... my code is here https://github.com/Yongyao/enso-forcasting/blob/master/LSTM_seq2seq.py#L87 can you diagnose what is going wrong? I can create a new thread if you want — Yongyao Jiang, Oct 27 '17 at 20:37
Please see the result plot at the end of the updated question — Yongyao Jiang, Oct 27 '17 at 20:41
By the looks of your sequence, I'd say you need way more cells and maybe layers to achieve something. I'm having trouble here in the same case as you, but simulating sinus functions. I can see the model handles it, but the quality is still bad (whenever I increase my cells, the result gets better). One thing that is very important is to check if your training data is contained between -1 and 1 (including Y). — Daniel Möller, Oct 27 '17 at 20:51
Your graph shows data from about -2 to +2. Ideally, you should normalize your data to fit in -1 to +1 and use a `tanh` activation at the end. (I've noticed my LSTM cases only work like this). — Daniel Möller, Oct 27 '17 at 20:53
@Daniel Möller I am trying to use your code here https://github.com/danmoller/TestRepo/blob/master/TestBookLSTM.ipynb. While training the model in the output layer you have specified model.add(LSTM(2,return_sequences=True)). In my case I have 20 variables but I want the model to predict only the first variable. How do I change the code accordingly ? — user6016731, Jun 26 '19 at 08:27

score 0 · Answer 2 · answered Nov 26 '17 at 00:26

Actually you can't just feed in the raw time series data, as the network won't fit to it naturally. The current state of RNNs still requires you to input multiple 'features' (manually or automatically derived) for it to properly learn something useful.

Usually the prior steps needed are:

Detrend
Deseasonalize
Scale (normalize)

A great source of information is this post from a Microsoft researcher which won a time series forecasting competition by the means of a LSTM Network.

Also this post: CNTK - Time series Prediction

How to deal with multi step time series forecasting in multivariate LSTM in keras

2 Answers2

Question 1:

Question 2:

Question 3:

Linked