Keras, sequential, and timeseries: should we flatten or not?

Question

I try to build a simple NN for timeseries analysis. So far I add only Dense layers (but be welcome to comment about LSTM etc. if this is what you prefer).

My input is in the usual format {samples, time steps, features}, let's say {1000, 100, 3} and I want a single-step output. So far I cannot understand whether I should flatten the data, and where.

The results change if I don't flatten, if I do before the last layer, and if I do before the first layer. But I have no way to tell yet if any of these is the correct one.

A good discussion can be found under this question. However, please note that I am specifically interested in timeseries. So, I wonder if flattening before the first layer might in some way remove the info needed for time-dependence...

score 3 · Answer 1 · answered Dec 01 '19 at 17:41

Since your data is temporal, I would recommend using a model specifically intended for processing temporal data. As you mention, LSTM is quite popular but Keras also has implementation of GRU and you can also try Temporal Convolution Networks (TCNs) which use simple causal convolutions and avoid the complicated memory/gating structures of LSTM and GRU and have been shown to be more effective on some problems in this paper.

You will be looking for many-to-one temporal structure since you are taking an input sequence and predicting the next timestep. See this post for help on implementing that with LSTMs. A key takeaway is there is an argument on the Keras temporal models of return_sequences, this for you should be set to False. The temporal models process the time dimension for you and, in the case of LSTMs, capture temporal dependencies by maintaining an internal memory. TCNs achieve similar behavior by performing 1-D convolutions, but causally in the sense that information from the past cannot leak into the future.

I would recommend starting with LSTM as you will find the most resources on blogs and SO questions about using them, and then you can try other models if you're not getting the results you want. I do not recommend using only dense layers, as they are not going to handle temporal relations properly, and I would also disagree with @Solvalou regarding 2D convolutions, because you are mixing temporal and spatial dimensions which will more likely just confuse your network. If you do convolutions, the causal 1-D convolutions of TCN should give you what you want.

Can you please explain why you think 2D convolutions might confuse the network? If you do a 1D convolution, the convolution is performed for each of the three features independently. But since the features might be correlated, a 2D convolution might make sense I think. It depends on the problem of course. If you want to make predictions on sequential data, a RNN is the better choice. But a CNN used on fixed duration inputs might perform better than a RNN. @Helen what exactly are you trying to achieve? — Solvalou, Dec 01 '19 at 18:16
I agree it depends on the problem (such is deep learning advice), but in general I think what you are suggesting is highly non-standard. The reason 2D convolutions work well for images is because there is typically very high correlation between neighboring pixels in a grid structure. You would have to have some strong prior knowledge on your features and their temporal relationships to think the same correlations should hold for your timeseries data. Of course, it may work well on a particular dataset, so it could be worth trying, but I definitely wouldn't recommend it as a starting point. — adamconkey, Dec 01 '19 at 19:21
@Solvalou for the sake of example this is quite similar to stock market analysis, but we also want to find out whether there are correlations between the input variables; if there are, the network would ideally take advantage of these as well. (So probably both suggestions should be tried out...) Thanks to you both for the useful answers! — Helen, Dec 02 '19 at 08:41

score 1 · Answer 2 · answered Dec 01 '19 at 14:26

In your case, you have input data in the shape of {?, 100, 3}, where ? represents the batch size. Assuming you apply a dense layer on your input, it will only operate on the last dimension, i.e. on your features. So this will only get information about your features, but not about the time series itself.
In order to include time series information, you have to apply flatten first.

But, there isn't a right way to achieve what you want to do. You could try to first apply a Dense layer with 1 node, which will result in a shape of (?, 100, 1). Afterwards, you flatten and get a shape of (?, 100). Finally you use another dense layer, or multiple of them to get your desired output shape.

But since you are dealing with input data with a fixed duration, i.e. always the same number of time steps, you should work with a Convolutional Neural Network (CNN). It will preserve the 2D information of your input data and will learn to recognize certain patterns in your data. You could combine it with pooling layers in order to make your network faster and to gain some translation invariance.

Otherwise, if you also want to handle sequential input data, you really should have a look at Recurrent Neural Networks (RNN).

Keras, sequential, and timeseries: should we flatten or not?

2 Answers2