How to include future values in a time series prediction of a RNN in Keras

Question

I currently have a RNN model for time series predictions. It uses 3 input features "value", "temperature" and "hour of the day" of the last 96 time steps to predict the next 96 time steps of the feature "value".

Here you can see a schema of it:

and here you have the current code:

#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from tensorflow import keras

# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50

#Read dataset
df = pd.read_csv('C:/Users/Desktop/TestData.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

# standardize data

data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)


scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)


# Prepare the input data for the RNN

series_reshaped_X =  np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y =  np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])


timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] 
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] 
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards] 

   
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:] 
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:] 
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]                                
   
   
# Build the model and train it

np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)

model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))

#Predict the test data
Y_pred = model.predict(X_test)


# Inverse the scaling (traInv: transformation inversed)

data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed =  np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards] 
Y_pred_traInv = scaler_standardized_Y.inverse_transform (Y_pred)
Y_test_traInv = scaler_standardized_Y.inverse_transform (Y_test)


# Calculate errors for every time slot of the multiple predictions

abs_diff = np.abs(Y_pred_traInv - Y_test_traInv)
abs_diff_perPredictedSequence = np.zeros((len (Y_test_traInv)))
average_LoadValue_testData_perPredictedSequence = np.zeros((len (Y_test_traInv)))
abs_diff_perPredictedTimeslot_ForEachSequence = np.zeros((len (Y_test_traInv)))
absoluteError_Load_Ratio_allPredictedSequence = np.zeros((len (Y_test_traInv)))
absoluteError_Load_Ratio_allPredictedTimeslots = np.zeros((len (Y_test_traInv)))

mse_perPredictedSequence = np.zeros((len (Y_test_traInv)))
rmse_perPredictedSequence = np.zeros((len(Y_test_traInv)))

for i in range (0, len(Y_test_traInv)):
    for j in range (0, len(Y_test_traInv [0])):
        abs_diff_perPredictedSequence [i] = abs_diff_perPredictedSequence [i] + abs_diff [i][j]
    mse_perPredictedSequence [i] = mean_squared_error(Y_pred_traInv[i] , Y_test_traInv [i] )
    rmse_perPredictedSequence [i] = np.sqrt(mse_perPredictedSequence [i])
    abs_diff_perPredictedTimeslot_ForEachSequence [i] = abs_diff_perPredictedSequence [i] / len(Y_test_traInv [0])
    average_LoadValue_testData_perPredictedSequence [i] = np.mean (Y_test_traInv [i])
    absoluteError_Load_Ratio_allPredictedSequence [i] = abs_diff_perPredictedSequence [i] / average_LoadValue_testData_perPredictedSequence [i]
    absoluteError_Load_Ratio_allPredictedTimeslots [i] = abs_diff_perPredictedTimeslot_ForEachSequence [i]  / average_LoadValue_testData_perPredictedSequence [i]

rmse_average_allPredictictedSequences  = np.mean (rmse_perPredictedSequence)
absoluteAverageError_Load_Ratio_allPredictedSequence = np.mean (absoluteError_Load_Ratio_allPredictedSequence)
absoluteAverageError_Load_Ratio_allPredictedTimeslots = np.mean (absoluteError_Load_Ratio_allPredictedTimeslots)
absoluteAverageError_allPredictedSequences =  np.mean (abs_diff_perPredictedSequence)
absoluteAverageError_allPredictedTimeslots =  np.mean (abs_diff_perPredictedTimeslot_ForEachSequence)

Here you have some test data Download Test Data

So now I actually would like to include not only past values of the features into the prediction but also future values of the features "temperature" and "hour of the day" into the prediction. The future values of the feature "temperature" can for example be taken from an external weather forecasting service and for the feature "hour of the day" the future values are know before (in the test data I have included a "forecast" of the temperature that is not a real forecast; I just randomly changed the values).

This way, I could assume that - for several applications and data - the forecast could be improved.

In a schema it would look like this:

Can anyone tell me, how I can do that in Keras with a RNN (or LSTM)? One way could be to include the future values as independant features as input. But I would like the model to know that the future values of a feature are connected to the past values of a feature.

Reminder: Does anybody have an idea how to do this? I'll highly appreciate every comment.

Are you predicting all 96 future time steps in one forward pass? — igodfried, Dec 16 '21 at 07:58
@igodfried: What do you mean by "predicting all 96 future time steps in one forward pass"? I am using the last 96 time steps to predict the future 96 steps. As I use `return_sequences=True` there is an output of 96 predictions for every time step in the time sequence. So the ``Y_pred` is a 3 dimensional vector. You can test the code. It is a minimal reproducible example and I provided test data for it. — PeterBe, Dec 16 '21 at 10:51
Well you could train your model to just forecast one time step at a time then concat the outputted time step and feed it back in 96 times with the real outputs of the other values that would solve your problem of wanting to use the real outputs. You don't need to forecast your entire desired forecast length at once. — igodfried, Dec 16 '21 at 19:04
@igodfried: Thanks for your comment. As said before, I am forecasting 96 steps at once and not just the single values. As far as I see it my question is independant from that. I would like to know how to include future features of used past features into the prediction (whether it is just 1 timestep at a time as you described or more as I preffer) — PeterBe, Dec 17 '21 at 08:13

Flavia Giammarino · Accepted Answer · 2022-01-12T07:35:14.363

3

The standard approach is to use an encoder-decoder architecture (see 1 and 2 for instance):

The encoder takes as input the past values of the features and of the target and returns an output representation.
The decoder takes as input the encoder output and the future values of the features and returns the predicted values of the target.

You can use any architecture for the encoder and for the decoder and you can also consider different approaches for passing the encoder output to the decoder (e.g. adding or concatenating it to the decoder input features, adding or concatenating it to the output of some intermediate decoder layer, or adding it to the final decoder output), the code below is just an example.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.layers import Input, Dense, LSTM, TimeDistributed, Concatenate, Add
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# define the inputs
target = ['value']
features = ['temperatures', 'hour of the day']
sequence_length = 96

# import the data
df = pd.read_csv('TestData.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime': [0]}, index_col=['datetime'])

# scale the data
target_scaler = StandardScaler().fit(df[target])
features_scaler = StandardScaler().fit(df[features])

df[target] = target_scaler.transform(df[target])
df[features] = features_scaler.transform(df[features])

# extract the input and output sequences
X_encoder = []  # past features and target values
X_decoder = []  # future features values
y = []          # future target values

for i in range(sequence_length, df.shape[0] - sequence_length):
    X_encoder.append(df[features + target].iloc[i - sequence_length: i])
    X_decoder.append(df[features].iloc[i: i + sequence_length])
    y.append(df[target].iloc[i: i + sequence_length])

X_encoder = np.array(X_encoder)
X_decoder = np.array(X_decoder)
y = np.array(y)

# define the encoder and decoder
def encoder(encoder_features):
    y = LSTM(units=100, return_sequences=True)(encoder_features)
    y = TimeDistributed(Dense(units=1))(y)
    return y

def decoder(decoder_features, encoder_outputs):
    x = Concatenate(axis=-1)([decoder_features, encoder_outputs])
    # x = Add()([decoder_features, encoder_outputs]) 
    y = TimeDistributed(Dense(units=100, activation='relu'))(x)
    y = TimeDistributed(Dense(units=1))(y)
    return y

# build the model
encoder_features = Input(shape=X_encoder.shape[1:])
decoder_features = Input(shape=X_decoder.shape[1:])
encoder_outputs = encoder(encoder_features)
decoder_outputs = decoder(decoder_features, encoder_outputs)
model = Model([encoder_features, decoder_features], decoder_outputs)

# train the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
model.fit([X_encoder, X_decoder], y, epochs=100, batch_size=128)

# extract the last predicted sequence
y_true = target_scaler.inverse_transform(y[-1, :])
y_pred = target_scaler.inverse_transform(model.predict([X_encoder, X_decoder])[-1, :])

# plot the last predicted sequence
plt.plot(y_true.flatten(), label='actual')
plt.plot(y_pred.flatten(), label='predicted')
plt.show()

In the example above the model takes two inputs, X_encoder and X_decoder, so in your case when generating the forecasts you can use the past observed temperatures in X_encoder and the future temperature forecasts in X_decoder.

edited Jan 12 '22 at 07:35

answered Jan 11 '22 at 16:17

Flavia Giammarino

7,987
11
30
40

1

Thanks for your answer Flavia. I have a couple of questions: 1) I want to use a LSTM for forecasting. So in the decoder there should be an LSTM (instead of a Concentrate layer that you use), right? 2) I don't understand how you build the model with the line `model = Model([encoder_features, decoder_features], decoder_outputs)`. Could you elaborate a little bit more on that. 3) You wrote "In the example above the model takes two inputs, X_encoder and X_decoder" --> when I want to have more inputs (features) do I have to use more X_encoders (e.g. `X_encoder_1`, `X_encoder_2`) or X_decoders? – PeterBe Jan 12 '22 at 10:47
4) Why are you using only `y` in the encoder and not `x` and `y` as illustrated in your figure? And in the decoder you use `x` and 'y' 5) Why do you need the InputLayer in the features `Input(shape=X_encoder.shape[1:])`? What is it actually doing? – PeterBe Jan 12 '22 at 10:53
1

As I already mentioned in my answer you can design the encoder and decoder architecture in whatever way you think it’s appropriate, so you can add an LSTM layer to the decoder if you want. However, recurrent layers are not the only option for time series forecasting, see [this paper](https://arxiv.org/pdf/1803.01271.pdf) for a discussion. – Flavia Giammarino Jan 12 '22 at 12:12
1

I think that explaining the details of Keras functional API and of multiple input models is beyond the scope of this question. I would recommend that you take a look at the existing answers, such as [this one](https://stackoverflow.com/a/55233566/11989081), or ask a new question. – Flavia Giammarino Jan 12 '22 at 12:13
2

Thanks Flavia for your comment. But why are you using only y in the encoder and not x and y as illustrated in your figure? And in the decoder you use x and 'y'? And strangely in your prediction you use `model.predict([X_encoder, X_decoder])`. So you use the same for predictions as for the training. Why are you doing that? Normally you subdivide your data into training, validation and test sets. – PeterBe Jan 12 '22 at 17:01
As explained in my answer the encoder takes as input the past values of the features and of the target. As you can see from my code, `X_encoder` is constructed using `df[features + target].iloc[i - sequence_length: i]`, i.e. both `x` (the `features`) and `y` (the `target`). – Flavia Giammarino Jan 12 '22 at 18:14
`X_decoder` is instead constructed using only the future values of `x` (the `features`) using `df[features].iloc[i: i + sequence_length]` as we don't know the future values of `y` (the `target`), which is indeed what we want to predict. – Flavia Giammarino Jan 12 '22 at 18:18
I included the prediction code `model.predict([X_encoder, X_decoder])` simply to show how it works with the two inputs. In any case I would recommend that you use [time series cross validation](https://otexts.com/fpp3/tscv.html) (or backtesting) for evaluating time series forecasting models rather than a fixed training / validation / test split. – Flavia Giammarino Jan 12 '22 at 18:30
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241028/discussion-between-peterbe-and-flavia-giammarino). – PeterBe Jan 13 '22 at 11:10
Thanks Flavia for your answers and effort. I really appreciate it. I have a follow up question to your suggested code in the chat. Would you mind having a look at it? – PeterBe Jan 27 '22 at 09:43
@ FlaviaGiammarino I have gone through this code couple of times to understand it clearly. I tried to understand it by running with my own data. At the end you have shown one last predicted sequence of the trained model. My doubt is whether new future input data should be prepared and fed ( through encoders, decoders, Y) while building the model itself. And whether the length of the input feature and future input data wont conflict? – Poka Jul 04 '22 at 21:14
@ FlaviaGiammarino, For example, out of 1000 (X,Y) pairs, I want to train model with all the past pairs and I have only another 100 values future features(X). So, my confusion is how can I prepare my input data set and how to pass it to predict using trained model. Will there be no data length mismatch for encoder and decoder? I hope my question is understandable.. – Poka Jul 04 '22 at 21:24
Hello @Flavia. Could you please reply to this other example? https://stackoverflow.com/questions/75203081/how-to-add-in-a-keras-time-series-prediction-model-the-current-value-of-a-variab – skan Jan 22 '23 at 19:15
What is the purpose of using TimeDistributed here? – skan Mar 18 '23 at 19:53

Rafael Xavier Botelho · Answer 2 · 2022-08-25T02:19:16.583

It is a pytorch code to time series prediction with an known external/exogenous regressor to the given period forecasted.Hope it helps!!!Have a marvellous day !!!

The input format is a 3d Tensor an output 1d array (MISO-Multiple Inputs Single Output)

def CNN_Attention_Bidirectional_LSTM_Encoder_Decoder_predictions(model,data ,regressors, extrapolations_leght):
 n_input = extrapolations_leght 
 pred_list = []  
 batch = data[-n_input:]
     model = model.train()
 pred_list.append(torch.cat(( model(batch)[-1],  torch.FloatTensor(regressors.iloc[1,[1]]).to(device).unsqueeze(0)),1))  
 batch =  torch.cat((batch[n_input-1].unsqueeze(0), pred_list[-1].unsqueeze(0)),1)   
 batch = batch[:, 1:, :]
 for i in range(n_input-1):
       model = model.eval()
   pred_list.append(torch.cat((model(batch).squeeze(0), torch.FloatTensor(regressors.iloc[i+1,[1]]).to(device).unsqueeze(0)),1)) 
   batch =  torch.cat((batch, pred_list[-1].unsqueeze(0)),1)  
   batch = batch[:, 1:, :]
       model = model.train()
 return  np.array([pred_list[j].cpu().detach().numpy() for j in range(n_input)])[:,:, 0]

How to include future values in a time series prediction of a RNN in Keras

2 Answers2

Linked