How to use Multivariate time-series prediction with Keras, when multiple samples are used

Question

As the title states, I am doing multivariate time-series prediction. I have some experience with this situation and was able to successfully setup and train a working model in TF Keras.

However, I did not know the 'proper' way to handle having multiple unrelated time-series samples. I have about 8000 unique sample 'blocks' with anywhere from 800 time steps to 30,000 time steps per sample. Of course I couldn't concatenate them all into one single time series because the first points of sample 2 are not related in time with the last points of sample 1.

Thus my solution was to fit each sample individually in a loop (at great inefficiency).

My new idea is can/should I pad the start of each sample with empty time-steps = to the amount of look back for the RNN and then concatenate the padded samples into one time-series? This will mean that the first time-step will have a look-back data of mostly 0's which sounds like another 'hack' for my problem and not the right way to do it.

OverLordGoldDragon · Accepted Answer · 2019-12-03T19:47:00.200

The main challenge is in 800 vs. 30,000 timesteps, but nothing you can't do.

Model design: group sequences into chunks - for example, 30 sequences of 800-to-900 timesteps, padded, then 60 sequences of 900-to-1000, etc. - don't have to be contiguous (i.e. next can be 1200-to-1500)
Input shape: (samples, timesteps, channels) - or equivalently, (sequences, timesteps, features)
Layers: Conv1D and/or RNNs - e.g. GRU, LSTM. Each can handle variable timesteps
Concatenation: don't do it. If each of your sequences is independent, then each must be fed along dimension 0 in Keras - the batch or samples dimension. If they are dependent, e.g. multivariate timeseries, like many channels in a signal - then feed them along the channels dimension (dim 2). But never concatenate along timeseries dimension, as it implies causal continuity whrere none exists.
Stateful RNNs: can help in processing long sequences - info on how they work here
RNN capability: is limited w.r.t. long sequences, and 800 is already in danger zone even for LSTMs; I'd suggest dimensionality reduction via either autoencoders or CNNs w/ strides > 1 at input, then feeding their outputs to RNNs.
RNN training: is difficult. Long train times, hyperparameter sensitivity, vanishing gradients - but, with proper regularization, they can be powerful. More info here
Zero-padding: before/after/both - debatable, can read about it, but probably stay clear from "both" as learning to ignore paddings is easier with one locality; I personally use "before"
RNN variant: use CuDNNLSTM or CuDNNGRU whenever possible, as they are 10x faster

Note: "samples" above, and in machine learning, refers to independent examples / observations, rather than measured signal datapoints (which would be referred to as timesteps).

Below is a minimal code for what a timeseries-suited model would look like:

from tensorflow.keras.layers import Input, Conv1D, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import numpy as np

def make_data(batch_shape):  # dummy data
    return (np.random.randn(*batch_shape),
            np.random.randint(0, 2, (batch_shape[0], 1)))

def make_model(batch_shape):  # example model
    ipt = Input(batch_shape=batch_shape)
    x   = Conv1D(filters=16, kernel_size=10, strides=2, padding='valid')(ipt)
    x   = LSTM(units=16)(x)
    out = Dense(1, activation='sigmoid')(x)  # assuming binary classification

    model = Model(ipt, out)
    model.compile(Adam(lr=1e-3), 'binary_crossentropy')
    return model

batch_shape = (32, 100, 16)  # 32 samples, 100 timesteps, 16 channels
x, y  = make_data(batch_shape)
model = make_model(batch_shape)

model.train_on_batch(x, y)

Wow I think I had a big misunderstanding about the zero dimension of the LSTM in Keras. I thought the time relation was along the zero dimension but its really the 1 dimension, if I'm now understanding correctly. — Graylien, Dec 03 '19 at 19:53
@Graylien That is correct; in _all_ Keras instances, dim0 is reserved for samples. For Conv1D and RNNs, dim1 is timesteps. If you only have 1 sequence, you'd feed it as `(1, timesteps, channels)` - and if sequences are univariate, they'd have shape `(samples, timesteps, 1)` (or `(1, timesteps, 1)`) — OverLordGoldDragon, Dec 03 '19 at 19:57

How to use Multivariate time-series prediction with Keras, when multiple samples are used

1 Answers1