Keras Recurrent Neural Networks For Multivariate Time Series

Question

I have been reading about Keras RNN models (LSTMs and GRUs), and authors seem to largely focus on language data or univariate time series that use training instances composed of previous time steps. The data I have is a bit different.

I have 20 variables measured every year for 10 years for 100,000 persons as input data, and the 20 variables measured for year 11 as output data. What I would like to do is predict the value of one of the variables (not the other 19) for the 11th year.

I have my data structured as X.shape = [persons, years, variables] = [100000, 10, 20] and Y.shape = [persons, variable] = [100000, 1]. Below is my Python code for a LSTM model.

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(128, activation = 'tanh', 
     input_shape = (X.shape[1], X.shape[2])))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X, Y, epochs = 25, batch_size = 128)

I have four (related) questions, please:

Have I coded the Keras model correctly for the data structure I have? The performance I get from a fully-connected network (using flattened data) and from LSTM, GRU, and 1D CNN models are nearly identical, and I don't know if I have made an error in Keras or if a recurrent model is simply not helpful in this case.
Should I have Y as a series with shape Y.shape = [persons, years] = [100000, 11], rather than including the variable in X, which would then have shape X.shape = [persons, years, variables] = [100000, 10, 19]? If so, how can I get the RNN to output the predicted sequence? When I use return_sequences = True, Keras returns an error.
Is this the best way to predict with the data I have? Are there better option choices available in the Keras RNN models, or even other models?
How could I simulate data resembling the data structure I have so that a RNN model would outperform a fully-connected network?

UPDATE:

I have tried a simulation, with what I hope is a very simple case where an RNN should be expected to outperform a FNN.

While the LSTM tends to outperform the FNN when both have less hidden layers (4), the performance becomes identical with more hidden layers (8+). Can anyone think of a better simulation where a RNN would be expected to outperform a FNN with a similar data structure?

from keras import models
from keras import layers

from keras.layers import Dense, LSTM

import numpy as np
import matplotlib.pyplot as plt

The code below simulates data for 10,000 instances, 10 time steps, and 2 variables. If the second variable has a 0 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 3. If the second variable has a 1 in the very first time step, then Y is the value of the first variable for the very last time step multiplied by 9.

My hope was that the RNN would keep the value of second variable at the very first time step in memory and use that to know which value (3 or 9) to multiply the the first variable for the very last time step.

## Simulate data.

instances = 10000

sequences = 10

X = np.zeros((instances, sequences * 2))

X[:int(instances / 2), 1] = 1

for i in range(instances):

    for j in range(0, sequences * 2, 2):

        X[i, j] = np.random.random()

Y = np.zeros((instances, 1))

for i in range(len(Y)):

    if X[i, 1] == 0:

        Y[i] = X[i, -2] * 3

    if X[i, 1] == 1:

        Y[i] = X[i, -2] * 9

Below is code for a FNN:

## Densely connected model.

# Define model.

network_dense = models.Sequential()
network_dense.add(layers.Dense(4, activation = 'relu', 
     input_shape = (X.shape[1],)))
network_dense.add(Dense(1, activation = None))

# Compile model.

network_dense.compile(optimizer = 'rmsprop', loss = 'mean_absolute_error')

# Fit model.

history_dense = network_dense.fit(X, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_dense.predict(X[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_dense.predict(X[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

Below is code for a LSTM:

## Structure X data for LSTM.

X_lstm = X.reshape(X.shape[0], X.shape[1] // 2, 2)

X_lstm.shape

## LSTM model.

# Define model.

network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(4, activation = 'relu', 
     input_shape = (X_lstm.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))

# Compile model.

network_lstm.compile(optimizer = 'rmsprop', loss = 'mean_squared_error')

# Fit model.

history_lstm = network_lstm.fit(X_lstm, Y, epochs = 100, batch_size = 256, verbose = False)

plt.scatter(Y[X[:, 1] == 0, :], network_lstm.predict(X_lstm[X[:, 1] == 0, :]), alpha = 0.1)
plt.plot([0, 3], [0, 3], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 0 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

plt.scatter(Y[X[:, 1] == 1, :], network_lstm.predict(X_lstm[X[:, 1] == 1, :]), alpha = 0.1)
plt.plot([0, 9], [0, 9], color = 'black', linewidth = 2)
plt.title('LSTM, FNN, Second Variable has a 1 in the Very First Time Step')
plt.xlabel('Actual')
plt.ylabel('Predicted')

plt.show()

score 5 · Accepted Answer · answered Aug 22 '18 at 23:15

Yes the code used is correct for what you are trying to do. 10 years is the time window used to predict the following year so that should be the number of inputs into your model for each of the 20 variables. The sample size of 100,000 observations is not relevant to the input shape of your model.
The way that you had originally shaped the dependent variable Y is correct. You are predicting a window of 1 year for 1 variable and you have 100,000 observations. The key word argument return_sequences=True will cause an error to be thrown because you only have a single LSTM layer. Set this parameter to True if you are implementing multiple LSTM layers and the layer in question is followed by another LSTM layer.

I wish I could offer some guidance to 3 but without actually having your dataset I don't know if it's possible to answer this with any sort of certainty.

I will say that LSTM's were designed to address what is know as the the long term dependency problem present in regular RNN's. What this problem boils down to is that as the gap between when the relevant information was observed to the point where that information would be useful grows, the standard RNN will have a harder time learning the relationship between them. Think of predicting a stock price based on 3 days of activity vs an entire year.

This leads into number 4. If I use the term 'resembling' loosely and stretch your time window further out to say 50 years as opposed to 10, the advantages gained from using an LSTM would become more apparent. Although I'm sure that someone more experienced will be able to offer a better answer and I look forward to seeing it.

I found this page helpful for understanding LSTM's:

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

1. I am glad I am using the code correctly. I included the number of records just for the sake of illustration. — from keras import michael, Aug 23 '18 at 16:58
2. The difficulty I think I am having is with how to define a many-to-one or a many-to-many RNN in Keras. I have seen diagrams where each time step of Y is evaluated, and then the activation is passed to the next time step. I want to be sure I am setting this up correctly. — from keras import michael, Aug 23 '18 at 17:02
3. This is related to my difficulty from 2 above. If I remove the variable from X and instead include it as a sequence in Y, does Keras automatically use the information from Y at one time step to inform the prediction of Y at the next time step; or is it better to keep the variable in X and just predict the value of that variable at time step 11? Or is it a case of trying both and seeing which performs better? — from keras import michael, Aug 23 '18 at 17:07
4. By 'resembling', I meant the same structure as the data I have. I would like to simulate data where an RNN would be expected to outperform a fully-connected network, even just the simplest case. I have tried a few simulations and in each case a fully-connected network and a RNN perform the same. I'll keep trying and post if I find something helpful. — from keras import michael, Aug 23 '18 at 17:11

Keras Recurrent Neural Networks For Multivariate Time Series

1 Answers1

Linked