The main challenge is in 800 vs. 30,000 timesteps, but nothing you can't do.
- Model design: group sequences into chunks - for example, 30 sequences of 800-to-900 timesteps, padded, then 60 sequences of 900-to-1000, etc. - don't have to be contiguous (i.e. next can be 1200-to-1500)
- Input shape:
(samples, timesteps, channels)
- or equivalently, (sequences, timesteps, features)
- Layers:
Conv1D
and/or RNNs - e.g. GRU, LSTM
. Each can handle variable timesteps
- Concatenation: don't do it. If each of your sequences is independent, then each must be fed along dimension 0 in Keras - the batch or samples dimension. If they are dependent, e.g. multivariate timeseries, like many channels in a signal - then feed them along the
channels
dimension (dim 2). But never concatenate along timeseries dimension, as it implies causal continuity whrere none exists.
- Stateful RNNs: can help in processing long sequences - info on how they work here
- RNN capability: is limited w.r.t. long sequences, and 800 is already in danger zone even for LSTMs; I'd suggest dimensionality reduction via either autoencoders or CNNs w/
strides > 1
at input, then feeding their outputs to RNNs.
- RNN training: is difficult. Long train times, hyperparameter sensitivity, vanishing gradients - but, with proper regularization, they can be powerful. More info here
- Zero-padding: before/after/both - debatable, can read about it, but probably stay clear from "both" as learning to ignore paddings is easier with one locality; I personally use "before"
- RNN variant: use
CuDNNLSTM
or CuDNNGRU
whenever possible, as they are 10x faster
Note: "samples" above, and in machine learning, refers to independent examples / observations, rather than measured signal datapoints (which would be referred to as timesteps
).
Below is a minimal code for what a timeseries-suited model would look like:
from tensorflow.keras.layers import Input, Conv1D, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import numpy as np
def make_data(batch_shape): # dummy data
return (np.random.randn(*batch_shape),
np.random.randint(0, 2, (batch_shape[0], 1)))
def make_model(batch_shape): # example model
ipt = Input(batch_shape=batch_shape)
x = Conv1D(filters=16, kernel_size=10, strides=2, padding='valid')(ipt)
x = LSTM(units=16)(x)
out = Dense(1, activation='sigmoid')(x) # assuming binary classification
model = Model(ipt, out)
model.compile(Adam(lr=1e-3), 'binary_crossentropy')
return model
batch_shape = (32, 100, 16) # 32 samples, 100 timesteps, 16 channels
x, y = make_data(batch_shape)
model = make_model(batch_shape)
model.train_on_batch(x, y)