python forecasting building LSTM

Question

I came across these two pages - page 1 and page 2 which use LSTM for forecasting. Thing that confused me is how/if they are using past Y variable values to predict future Y variable values - for example Y variable from time 1,2, 3 to predict Y variable for time 4,5,6

Currently these models are seem to be using consecutive data points of x variables to predict Y variable in future. For example x variables from time 1,2 and 3 to predict y variable in time 4, 5 and 6 etc. Would it be okay to use Y variable along with x variables? for example Y variable from time 1,2, and 3 along with x variables from that same time period to predict y variable in time 4, 5 and 6. I could do this just by adding Y variable as new x variable in the data. Rest of the process (function custom_ts_multi_data_prep) that prepares data for modelling will remain exactly the same

Please suggest if there is any better link that employs similar LSTMs and clarify questions from paragraph 1 and 2

Hi, I'm not sure I understood the question. Check out [this tutorial](https://github.com/scarrazza/DL2022/blob/main/Lecture_6/solutions/exercise2.py) and tell me if this answers in any way your question — Sala, Jul 16 '22 at 17:32

score 1 · Accepted Answer · answered Jul 19 '22 at 12:52

It is completely sensible to use y[t-1] or y[t-n] for some n > 0 to predict y[t]. You shouldn't, though, use y[t] to try and predict y[t], as you probably don't know ahead of time that which you are trying to predict.

In fact, in the example you gave (page 2), the variable traffic_volume which we predict for exists also in the input sequence, so the example you are looking for is exactly that, if I understand you correctly. The function custom_ts_multi_data_prep() adds, for each time step, the data from previous time steps into X and the following time steps into y.(*) That data is also implicitly encoded in the activations of the LSTM itself - LSTM is a type of recurrent network which encodes the data it has seen up until now as input for the next step of the prediction process. However, it may be very logical to incorporate true data from previous time steps into the prediction process for a few reasons:

The model's state that is passed on to the next prediction step is only a partial view of the true state, and knowing the actual progression in the "real world" may be critical for predicting the next step.
Similar to the rationale behind residual skip connections in CNNs, adding the "raw" value of the previous time step maybe help the model by focusing on only the residual problem - how to get from y[t-1] to y[t], while using x[t] (or x[t-1], depending on your specific problem), rather than performing the jump from x[t] to y[t] with no true data from previous time steps.

Having said that, adding almost any feature from the system will likely make your model "better" and more prone to overfit, so take this into consideration when choosing which item this wisely.

(*) small remark: note that in this specific example they leave a gap of one time stpe that doesn't appear here nor there, and I am not sure if it is intentional or a mistake (X contains data from i-window to i-1 while y contains data from i+1 to i+horizon, while i isn't included in either -- this might be a misunderstanding of the author about how range() works; or I might be missing something).

score 0 · Answer 2 · answered Jul 17 '22 at 13:18

Correct me if I am wrong, but if you are asking to include Y_data into X_data and then train the model for X_data(inclusive of Y_data) alongside Y_data completing the process as mentioned in page-2. Then the model will be biased, due to it already having the predicted column available in the training data.

score 0 · Answer 3 · answered Jul 18 '22 at 05:32

Actually they are using past Y variable values to predict future Y variable values. In page2, the Y variable in the example is "traffic_volume", and it is included in X_data. And I found a more detailed example, and I hope it can help you. If you have any question, feel free to comment.

Albin Sidås · Answer 4 · 2022-07-19T08:39:01.820

I built a LSTM-model which used previous Y-values as inputs for predicting future Y-values.

What you should consider is if you want to use the label value for the trainingset or if you want to use a split model, in the sense that you get predicted Y-values as input together with the other X-features. The difference is that you can attain a self-adjusting model if it trains the dualmodel and you backpropagate through both the models. Using predicted Y inputs to the second model will give you a model which trains to adjust over time rather than geting corrected with the correct labels in the Y column. This will be a more advanced implementation to handle than what you described. Somewhat related: How to include future values in a time series prediction of a RNN in Keras

As you mention, you can just add the labels in the custom_ts_multi_data_prep function as another column in the timeseries.

An interesting point would be to use a explainable AI framework such as SHAP or similar to examine which inputs impacts the prediction of the model the most. With SHAP you could see, for example if the label have exceptionally the largest impact you may want to look at other options.

python forecasting building LSTM

4 Answers4