Adding exogenous variables to my univariate LSTM model

Question

My data frame is on an hourly basis (index of my df) and I want to predict y.

> df.head()

          Date           y             
    2019-10-03 00:00:00 343   
    2019-10-03 01:00:00 101  
    2019-10-03 02:00:00 70  
    2019-10-03 03:00:00 67  
    2019-10-03 04:00:00 122

I will now import the libraries and train the model:

  from keras.models import Sequential
  from keras.layers import Dense
  from keras.layers import LSTM
  from sklearn.preprocessing import MinMaxScaler
  min_max_scaler = MinMaxScaler()
  prediction_hours = 24
  df_train= df[:len(df)-prediction_hours]
  df_test= df[len(df)-prediction_hours:]
  print(df_train.head())
  print('/////////////////////////////////////////')
  print (df_test.head())
  training_set = df_train.values
  training_set = min_max_scaler.fit_transform(training_set)

  x_train = training_set[0:len(training_set)-1]
  y_train = training_set[1:len(training_set)]
  x_train = np.reshape(x_train, (len(x_train), 1, 1))
  num_units = 2
  activation_function = 'sigmoid'
  optimizer = 'adam'
  loss_function = 'mean_squared_error'
  batch_size = 10
  num_epochs = 100
  regressor = Sequential()
  regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1)))
  regressor.add(Dense(units = 1))
  regressor.compile(optimizer = optimizer, loss = loss_function)
  regressor.fit(x_train, y_train, batch_size = batch_size, epochs = num_epochs)

And after training, I can actually use it on my test data:

 test_set = df_test.values
 inputs = np.reshape(test_set, (len(test_set), 1))
 inputs = min_max_scaler.transform(inputs)
 inputs = np.reshape(inputs, (len(inputs), 1, 1))
 predicted_y = regressor.predict(inputs)
 predicted_y = min_max_scaler.inverse_transform(predicted_y)

This is the prediction I got:

The forecast is actually pretty good: is it too good to be true? Am I doing anything wrong? I followed the implementation step by step from a GitHub implementation.

I want to add some exogenous variables, namely v1, v2, v3. If my dataset now looks like this with new variables,

df.head()

          Date           y   v1   v2   v3          
    2019-10-03 00:00:00 343  4     6    10  
    2019-10-03 01:00:00 101  3     2    24
    2019-10-03 02:00:00 70   0     0    50  
    2019-10-03 03:00:00 67   0     4    54
    2019-10-03 04:00:00 122  3     3    23

How can I include these variables v1,v2 and v3 in my LSTM model? The implementation of the multivariate LSTM is very confusing to me.

Edit to answer Yoan suggestion:

For a dataframe with the date as index and with the columns y, v1, v2 and v3, I've done the following as suggested:

  from keras.models import Sequential
  from keras.layers import Dense
  from keras.layers import LSTM
  from sklearn.preprocessing import MinMaxScaler
  min_max_scaler = MinMaxScaler()
  prediction_hours = 24
  df_train= df[:len(df)-prediction_hours]
  df_test= df[len(df)-prediction_hours:]
  print(df_train.head())
  print('/////////////////////////////////////////')
  print (df_test.head())
  training_set = df_train.values
  training_set = min_max_scaler.fit_transform(training_set)

  x_train = np.reshape(x_train, (len(x_train), 1, 4))
  y_train = training_set[0:len(training_set),1] #I've tried with 0:len.. and 
                                                                #for 1:len..
  
  num_units = 2
  activation_function = 'sigmoid'
  optimizer = 'adam'
  loss_function = 'mean_squared_error'
  batch_size = 10
  num_epochs = 100
  regressor = Sequential()
  regressor.add(LSTM(units = num_units, activation = activation_function, 
  input_shape=(None, 1,4)))
  regressor.add(Dense(units = 1))
  regressor.compile(optimizer = optimizer, loss = loss_function)
  regressor.fit(x_train, y_train, batch_size = batch_size, epochs = 
  num_epochs)

But I get the following error:

 only integer scalar arrays can be converted to a scalar index

I wouldn't know where to start considering that your model builds on a lot of misunderstanding on how to prepare data for RNN. The main point of using LSTM is to learn from **sequences** of data. So you need to build these sequences somehow given your input data. My advice would be to read through https://www.tensorflow.org/tutorials/structured_data/time_series#recurrent_neural_network or checking LSTM-related notebooks on Kaggle.com — Sura-da, Dec 19 '20 at 13:29
To give a hint at your question specifically, there should be no difference whether you feed a single or multiple variable as input since you can use your whole DataFrame just as you did in your code — Sura-da, Dec 19 '20 at 13:31
Your output is not too good to be true: it shows error rate in all points _except_ two.Seems that you have very little test data (just 24 samples) and you're using LSTM with 2 units for a sequence with just one point - not a sequence (which is a waste since you're not using LSTM memory and forget gates at all) — Iñigo González, Dec 21 '20 at 11:47
is it normal that you reshape `x_train` twice ? It might be why you get an extra dimension — Yoan B. M.Sc, Dec 22 '20 at 15:04
@YoanB.M.Sc I only reshaped it once. I was wrong writing here. — Numbermind, Dec 22 '20 at 15:11
I've tested the code and `x_train` does return the appropriate shape. `(batch, seq, features)`. in your reshape i guess `x_train` is `trainning set` ? — Yoan B. M.Sc, Dec 22 '20 at 15:17
There are multiple ways you can do this. General practice is to add these auxiliary features either to the embeddings or to the output of the LSTMs (while setting it to return the sequence of hidden states). I have written a detailed answer with code examples and architecture decisions that can be taken while working with such data. — Akshay Sehgal, Dec 22 '20 at 15:45
@AkshaySehgal, why are you using embedding layer ? This make sense on text processing, but on the OP with regular time series it's not "general practice" to add embedding layer. Good post though, very detail. — Yoan B. M.Sc, Dec 22 '20 at 15:58
You use the embedding layer for label encoded sequences with a definite vocabulary, which is the case when working with sequential categorical or text type data, as I mentioned in the post (check scenario 1). Of course for continuous features, you would not need that atall. And thanks, glad you liked it :) — Akshay Sehgal, Dec 22 '20 at 16:23

Akshay Sehgal · Answer 1 · 2020-12-22T15:43:30.380

Combining auxiliary features with sequences

There are multiple ways of handling auxiliary features with LSTMs and all of these are inspired by what your data contains and how you want to model these features. I'll discuss 4 different scenarios and strategies for your reference below with some dummy code.

Scenario 1: If you have simple continuous features, simply pass them into an LSTM!
Scenario 2: If you have multiple label encoded sequences, embed and then encode them separately in LSTMs after which concatenate them for your downstream predictions
If you have a label encoded sequence and some auxiliary features, you can -
- Scenario 3: Append these after embedding them and then pass them into the LSTMs
- Scenario 4: Append them to the output of the LSTM and choose to pass them to another set of LSTMs

Scenario 1:

Let's say you have 4 sequential features and all of those are continuous (not label encoded as in text or categorical). In this case, LSTMs are well equipped to handle these features directly. An LSTM layer expects a shape of (batch, sequence, features) and therefore such a scenario fits nicely without any modifications.

Features --> LSTM --> Process --> Predict

Code

from tensorflow.keras import layers, Model, utils

#Four continuous features
X = np.random.random((100,10,4))
Y = np.random.random((100,))

###Define model###
inp = layers.Input((10,4))

#LSTMs
x = layers.LSTM(8, return_sequences=True)(inp)
x = layers.LSTM(8)(x)
out = layers.Dense(1)(x)

model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

Scenario 2:

Next, let's assume another simple case. You have 2 labels encoded sequences (say text). As one would think, all you would want to do is separately create sequential features by building LSTMs for each of them and then concatenating them at the end before your downstream prediction task.

Sequence --> Embed --> LSTM -->|
                               * --> Append --> Process --> Predict
Sequence --> Embed --> LSTM -->|

Code

from tensorflow.keras import layers, Model, utils

#Two sequential, label encoded features
X = np.random.random((100,10,2))
Y = np.random.random((100,))

###Define model###
inp = layers.Input((10,2))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1])(inp)

#Append embeddings features
x1 = layers.Embedding(1000, 5)(feature1)
x2 = layers.Embedding(1200, 7)(feature2)

#LSTMs
x1 = layers.LSTM(8, return_sequences=True)(x1)
x1 = layers.LSTM(8)(x1)

x2 = layers.LSTM(8, return_sequences=True)(x2)
x2 = layers.LSTM(8)(x2)

#Combine LSTM final states
x = layers.concatenate([x1,x2])
out = layers.Dense(1)(x)

model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

Scenario 3:

Next scenario, let's assume you are working with one feature which is label encoded sequence (say text). Before you pass this feature to LSTMs you will have to encode it into an n dimensional vector it using an embedding layer. This will result in a (batch, sequence, embedding_dim) shaped input for the LSTMs which is no problem at all. Let's say, however, you also have 3 auxiliary features which are continuous (and properly normalized). One simply thing you could do is just append these to the output of the Embedding layer to get a (batch, sequence, embedding_dims+auxiliary) input which the LSTM can handle as well!

Sequence --> Embed ----->|
                         *--> Append --> LSTM -> Process --> Predict
Auxiliary --> Process -->|

Code

from tensorflow.keras import layers, Model, utils

#One sequential, label encoded feature & 3 auxilary features for each timestep
X = np.random.random((100,10,4))
Y = np.random.random((100,))

###Define model###
inp = layers.Input((10,4))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1:4])(inp)

#Append embeddings features
x = layers.Embedding(1000, 5)(feature1)
x = layers.concatenate([x, feature2])

#LSTMs
x = layers.LSTM(8, return_sequences=True)(x)
x = layers.LSTM(8)(x)
out = layers.Dense(1)(x)

model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

In the above example, after the label encoded input is embedded into the 5-dimensional vector, the 3 auxiliary inputs are appended and then the (10,8) dimensional sequence is passed to the LSTMs for doing their magic.

Scenario 4:

Let's say you have the same scenario as above, but you want the sequential features to be richer representations before you append the auxilary inputs. Here you could simply pass the sequential feature to an LSTM and append the auxiliary input to the OUTPUT of the LSTM and then decide to pass it into another LSTM if needed. This will require you to return_sequences=True so that you can get the same length sequence which can be appended to the auxiliary features for that set of time steps.

Sequence --> Embed --> LSTM(seq) -->|
                                    *--> Append --> Process --> Predict
Auxiliary --> Process ------------->|

Code

from tensorflow.keras import layers, Model, utils

#One sequential, label and 3 auxilary continous features
X = np.random.random((100,10,4))
Y = np.random.random((100,))

###Define model###
inp = layers.Input((10,4))
feature1 = layers.Lambda(lambda x: x[...,0])(inp)
feature2 = layers.Lambda(lambda x: x[...,1:4])(inp)
#feature2 = layers.Reshape((-1,1))(feature2)

#Append embeddings features
x = layers.Embedding(1000, 5)(feature1)

#LSTMs
x = layers.LSTM(8, return_sequences=True)(x)
x = layers.concatenate([x, feature2])
x = layers.LSTM(8)(x)

#Combine LSTM final states
out = layers.Dense(1)(x)

model = Model(inp, out)
utils.plot_model(model, show_layer_names=False, show_shapes=True)

There are architectures that add a single feature to the output of an LSTM and encode them again in an LSTM, after which they add the next feature and so on instead of adding all of them together. That is a design choice and will have to be tested for your specific data.

Hope this clarifies your question.

in the context of forecasting let's say you have in input sequence of n time steps with features (power_kwh, tempC, humidity ...) as you can see some features are related to weather for which you can also retrieve a forecast for the next time steps you are predicting, how whould you suggest to include those? Would you concatenate them to the output of the LSTM or would you include them in the LSTM input ? — Gavello, Aug 04 '21 at 21:08
In the last case would you add them as additional features or by shifting the existing weather features? (in this case you would loose the temporal correlation in the input sequence) — Gavello, Aug 04 '21 at 21:14
What if you have two groups of variables: a) you use its values but just up to a certain time to predict the following values (severa). and b) you don't have any limit, you can use all it's value up to the predicted period (for example the hour of the day).? — skan, Jan 23 '23 at 00:00

score 1 · Answer 2 · answered Dec 22 '20 at 13:55

1

Keras default implementation of LSTM expect input shape : (batch, sequence, features).

So when reshaping x_train instead of doing :

x_train = np.reshape(x_train, (len(x_train), 1, 1))

You simply have :

x_train = np.reshape(x_train, (len(x_train), 1, num_features))

It's not clear from your post whether you also want to predict this new features (multivariate prediction) or if you still want to predict y only.

In the first case you'll need to modify your Dense layer to account for the new dimension of the target :

regressor.add(Dense(units = num_features))

In the second case you'll need to reshape y_train to take only y

y_train = training_set[1:len(training_set),1] # (assuming Date is not the index)

Finally your LSTM input shape must be updated to accept the new reshaped x_train :

regressor.add(LSTM(units = num_units, activation = activation_function, input_shape=(None, 1, num_features)))

answered Dec 22 '20 at 13:55

Yoan B. M.Sc

1,485
5
18

My goal is to predict only y. Imagine v1, v2 and v3 is weather variables. If I want to predict tomorrow y and I know v1, v2,v3 estimates for tomorrow (from weather services), I only need to predict y. – Numbermind Dec 22 '20 at 14:04
@Numbermind, so you can keep your `Dense` layer as is. And reshape the rest of the data and the `input_shape` of your `LSTM` according to the answer. – Yoan B. M.Sc Dec 22 '20 at 14:08
I've done what you suggested and I now get an error saying: Input 0 of layer lstm_51 is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, None, 5]. (my data has the following variables y, v1, v2, v3, v4 and datetime as index). – Numbermind Dec 22 '20 at 14:43
@Numbermind could you edit your OP with your modification so I can see where it's coming from. It seems your input has to many dimensions. – Yoan B. M.Sc Dec 22 '20 at 14:46

Adding exogenous variables to my univariate LSTM model

2 Answers2

Combining auxiliary features with sequences

Scenario 1:

Scenario 2:

Scenario 3:

Scenario 4: