How to use deep learning models for time-series forecasting?

Question

I have signals recorded from machines (m1, m2, so on) for 28 days. (Note: each signal in each day is 360 length long).

machine_num, day1, day2, ..., day28
m1, [12, 10, 5, 6, ...], [78, 85, 32, 12, ...], ..., [12, 12, 12, 12, ...]
m2, [2, 0, 5, 6, ...], [8, 5, 32, 12, ...], ..., [1, 1, 12, 12, ...]
...
m2000, [1, 1, 5, 6, ...], [79, 86, 3, 1, ...], ..., [1, 1, 12, 12, ...]

I want to predict the signal sequence of each machine for next 3 days. i.e. in day29, day30, day31. However, I don't have values for days 29, 30 and 31. So, my plan was as follows using LSTM model.

The first step is to get signals for day 1 and asked to predict signals for day 2, then in the next step get signals for days 1, 2 and asked to predict signals for day 3, etc, so when I reach day 28, the network has all the signals up to 28 and is asked to predict the signals for day 29, etc.

I tried to do a univariant LSTM model as follows.

# univariate lstm example
from numpy import array
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
# define dataset
X = array([[10, 20, 30], [20, 30, 40], [30, 40, 50], [40, 50, 60]])
y = array([40, 50, 60, 70])
# reshape from [samples, timesteps] into [samples, timesteps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(3, 1)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(X, y, epochs=1000, verbose=0)
# demonstrate prediction
x_input = array([50, 60, 70])
x_input = x_input.reshape((1, 3, 1))
yhat = model.predict(x_input, verbose=0)
print(yhat)

However, this example is very simple since it does not have long sequences like mine. For example, my data for m1 would look as follows.

m1 = [[12, 10, 5, 6, ...], [78, 85, 32, 12, ...], ..., [12, 12, 12, 12, ...]]

Moreover, I need the prediction of day 29, 30, 31. In that case, I am unsure how to change this example to cater my needs. I want to sepcifically know if the direction I have chosen is correct. If so, how to do it.

I am happy to provide more details if needed.

EDIT:

I have mentioned the model.summary().

Maybe this answer has it? https://stackoverflow.com/questions/46901371/how-to-deal-with-multi-step-time-series-forecasting-in-multivariate-lstm-in-kera/46934799#46934799 --- Later I may write a specific answer to your question, with time. — Daniel Möller, Jan 30 '20 at 12:51
@DanielMöller Thank you very much for the comment. Sure, I will look at the answer of that question. Please kindly post an answer for this question as well when you get a free time. Looking forward to hearing from you. Thank you very much :) — EmJ, Jan 30 '20 at 21:40
@DanielMöller I am very curios to know your suggestions on this question. That is why I thought to quickly check if you would get some time to post an answer. It would be a great help for me. Thank you very much. Looking forward to hearing from you :) — EmJ, Feb 01 '20 at 23:35
Commenting because I can't save this link right now and very interested in the topic. — Celius Stingher, Feb 04 '20 at 22:34
@CeliusStingher Thank you very much for the comment. It is really great to hear that you are interested in the topic. Please kindly let me know your suggestions. I look forward to hearing from you. Thank you :) — EmJ, Feb 04 '20 at 23:11
@CeliusStingher Hi, this is just to check whether you have suggestions on this question. If so, please kindly let me know them. Thank you :) — EmJ, Feb 12 '20 at 07:59
Thank you for the reminder. Just to make sure I understand the dimensions correctly, we have 2000 machines that record an array for 360 values for each day. So all arrays are the same dimension and we could say the matrix is 2000*28*360 right? — Celius Stingher, Feb 12 '20 at 11:53
@CeliusStingher Thank you very much for the comment. Yes, you are correct. I have data of 2000 machines on 28 days. For each day I have an array of 360. Therefore the matrix is 2000*28*360. Please let me know if you need any further details. Looking forward for your suggestions. Thank you :) — EmJ, Feb 12 '20 at 11:58
Final question before testing, what is the relation between the 360 observations and each day. Can we say each observation is independent from the other (for each day) and what about observation 1 for day 1 and observation 1 for day 2? Could we find relationship between these values? Kind of like a MANOVA... TL;DR: Are there any relations that should be taken into account between the values? — Celius Stingher, Feb 12 '20 at 12:02
@CeliusStingher Thank you very much for the comment. Sorry for the delayed response as I saw your comment just now. The 360 observations for each day are calculated as follows. From every machine I collected its signal for every minute in 6 hours. i.e. the length of the signal of a machine is 6*60 = 360. Each machine has 360 length signals for 28 days. I have the data of about 2000 machines in my dataset. Unfortunately, I have not done MANOVA for the dataset. Please kindly let me know if my description is not clear. Thank you. Looking forward to hearing from you :) — EmJ, Feb 12 '20 at 12:31
@CeliusStingher Hi, please let me know if you need any further details. Looking forward for your suggestions. Thank you :) — EmJ, Feb 14 '20 at 04:10
I probably wont be able to get into it until saturday. Ibelieve I've got everything I need for now :) — Celius Stingher, Feb 14 '20 at 04:18
@CeliusStingher Thank you very much for the comment. Sure, I look forward to hearing from you on Saturday. Thank you :) — EmJ, Feb 14 '20 at 04:43
@DanielMöller Hi, I know you will have great suggestions on this question. Please let me know your thoughts if you get a free time. Thank you very much :) — EmJ, Feb 21 '20 at 02:08
So.... are they "sequences within sequences"? The 360 are sequentially displayed, minute by minute, right? The 360 measures are a sequence.... --- Then you have the day sequence that you want to forecast.... — Daniel Möller, Feb 21 '20 at 13:01
Last question.... are these machines "of the same nature"? Can we treat each machine as a similar individual, supposing that if machine 1 measured X for 28 days, and machine 2 also measured X for 28 days, the forecast for machine 1 and machine 2 should be the same? — Daniel Möller, Feb 21 '20 at 13:02
@DanielMöller Thank you very much for the comments. Yes you are correct. From every machine I collected its signal for every minute in 6 hours. i.e. the length of the signal of a machine is 6*60 = 360. Yes, you are correct with respect to your second comment. If machine 1 measured X for 28 days, and machine 2 also measured X for 28 days, the forecast for machine 1 and machine 2 should be the same. Please kindly let me know if you need further details. Looking forward to hearing from you :) — EmJ, Feb 21 '20 at 13:18

Daniel Möller · Accepted Answer · 2020-03-09T23:38:11.753

Model and shapes

Since these are sequences in sequences, you need to use your data in a different format.

Although you could just go like (machines, days, 360) and simply treat the 360 as features (that could work up to some point), for a robust model (then maybe there is a speed problem) you'd need to treat both things as sequences.

Then I'd go with data like (machines, days, 360, 1) and two levels of recurrency.

Our models input_shape then would be (None, 360, 1)

Model case 1 - Only day recurrency

Data shape: (machines, days, 360)
Apply some normalization to the data.

Here, an example, but models can be flexible as you can add more layers, try convolutions, etc:

inputs = Input((None, 360)) #(m, d, 360)
outs = LSTM(some_units, return_sequences=False, 
            stateful=depends_on_training_approach)(inputs)  #(m, some_units)
outs = Dense(360, activation=depends_on_your_normalization)(outs) #(m, 360)
outs = Reshape((1,360)) #(m, 1, 360) 
    #this reshape is not necessary if using the "shifted" approach - see time windows below
    #it would then be (m, d, 360)

model = Model(inputs, outs)

Depending on the complexity of the intra-daily sequences, they could get well predicted with this, but if they evolve in a complex way, then the next model would be a little better.

Always remember that you can create more layers and explore things to increase the capability of this model, this is only a tiny example

Model case 2 - Two level recurrency

Data shape: (machines, days, 360, 1)
Apply some normalization to the data.

There are so many many ways to experiment on how to do this, but here is a simple one.

inputs = Input((None, 360, 1)) #(m, d, 360, 1)

#branch 1
inner_average = TimeDistributed(
                    Bidirectional(
                        LSTM(units1, return_sequences=True, stateful=False),
                        merge_mode='ave'
                    )
                )(inputs) #(m, d, 360, units1)
inner_average = Lambda(lambda x: K.mean(x, axis=1))(inner_average) #(m, 360, units1)


#branch 2
inner_seq = TimeDistributed(
                LSTM(some_units, return_sequences=False, stateful=False)
            )(inputs) #may be Bidirectional too
            #shape (m, d, some_units)

outer_seq = LSTM(other_units, return_sequences = False, 
                 stateful=depends_on_training_approach)(inner_seq) #(m, other_units)

outer_seq = Dense(few_units * 360, activation = 'tanh')(outer_seq) #(m, few_units * 360)
    #activation = same as inner_average 


outer_seq = Reshape((360,few_units))(outer_seq) #(m, 360, few_units)


#join branches

outputs = Concatenate()([inner_average, outer_seq]) #(m, 360, units1+few_units)
outputs = LSTM(units, return_sequences=True, stateful= False)(outputs) #(m, 360,units)
outputs = Dense(1, activation=depends_on_your_normalization)(outputs) #(m, 360, 1)
outputs = Reshape((1,360))(outputs) #(m, 1, 360) for training purposes

model = Model(inputs, outputs)

This is one attempt, I made an average of the days, but I could have made, instead of inner_average, something like:

#branch 1
daily_minutes = Permute((2,1,3))(inputs) #(m, 360, d, 1)
daily_minutes = TimeDistributed(
                    LSTM(units1, return_sequences=False, 
                         stateful=depends_on_training_approach)
                )(daily_minutes) #(m, 360, units1)

Many other ways of exploring the data are possible, this is a highly creative field. You could, for instance, use the daily_minutes approach right after the inner_average excluding the K.mean lambda layer.... you got the idea.

Time windows approach

Your approach sounds nice. Give one step to predict the next, give two steps to predic the third, give three steps to predict the fourth.

The models above are suited to this approach.

Keep in mind that very short inputs may be useless and may make your model worse. (Try to imagine how many steps would be reasonably enough for you to start predicting the next ones)

Preprocess your data and divide it in groups:

group with length = 4 (for instance)
group with length = 5
...
group with length = 28

You will need a manual training loop where in each epoch you feed each of these groups (you can't feed different lenghts all together).

Another approach is, give all steps, make the model predict a shifted sequence like:

inputs = original_inputs[:, :-1] #exclude last training day
outputs = original_inputs[:, 1:] #exclude first training day

For making the models above suited to this approach, you need return_sequences=True in every LSTM that uses the day dimension as steps (not the inner_seq). (The inner_average method will fail, and you will have to resort to the daily_minutes approach with return_sequences=True and another Permute((2,1,3)) right after.

Shapes would be:

branch1 : (m, d, 360, units1)
branch2 : (m, d, 360, few_units) - needs to adjust the Reshape for this
- The reshapes using 1 timestep will be unnecessary, the days dimension will replace the 1.
- You may need to use Lambda layers to reshape considering the batch size and variable number of days (if details are needed, please tell me)

Training and predicting

(Sorry for not having the time for detailing it now)

You then can follow the approaches mentioned here and here too, more complete with a few links. (Take care with the output shapes, though, in your question, we are always keeping the time step dimension, even though it may be 1)

The important points are:

If you opt for stateful=False:
- this means easy training with fit (as long as you didn't use the "different lengths" approach);
- this also means you will need to build a new model with stateful=True, copy the weights of the trained model;
- then you do the manual step by step prediction
If you opt for stateful=True from the beginning:
- this necessarily means manual training loop (using train_on_batch for instance);
- this necessarily means you will need model.reset_states() whenever you are going to present a batch whose sequences are not sequels of the last batch (every batch if your batches contain whole sequences).
- don't need to build a new model to manually predict, but manual prediction remains the same

Wow, this is impressive. Thank you so much for the detailed answer. I am still reading and trying to understand the things you have mentioned. Please give me like two days to fully unserstand the things you have mentioned. If there is anything unclear I will ask it in comments. Thank you so much once again. This is very helpful Honestly, I was stuck in this question for about a month now. So, thank you so much for helping me :) — EmJ, Feb 22 '20 at 01:43
Thank you very much once again for the detailed answer. I think, I understand most of the parts of your answer now. However, now I have the difficulty in implementing them. It would be really great if you could show me how to do the training with `model case 1`. I can follow the same code to do it using `model case 2`. More specifically, I am not sure how to do this in keras `You will need a manual training loop where in each epoch you feed each of these groups (you can't feed different lenghts all together).`. I look forward to hearing from you. Thank you very much. — EmJ, Mar 01 '20 at 23:50
What time of time windows did you choose? The second link in my answer shows how to train. — Daniel Möller, Mar 02 '20 at 00:03
Thank you very much for the comment. I went through the links and wrote the code today. However, I have few questions on my code. Due to the limited space in comments, I included my code in the following google document: https://docs.google.com/document/d/1v6kA2Y8fzTPMVl-JvZgAe-UZHKze2jcLE3HNK8bEzhk/edit?usp=sharing I hae mentioned the questions I have in the same document. Please kindly let me know if there are any issues in the code. The document is editable. So, please correct if there any issues in the code. I look forward to hearing from you. Thank you very much. — EmJ, Mar 02 '20 at 06:19
Please kindly let me know if you do not have the access to the document :) — EmJ, Mar 03 '20 at 11:56
Thank you very much for the comment. That is totally fine. Please let me know your thoughts when you are at home :) — EmJ, Mar 03 '20 at 11:58
I am very excited to know your feedback on the code that I wrote from your `time shifted` method. The code is in: https://docs.google.com/document/d/1v6kA2Y8fzTPMVl-JvZgAe-UZHKze2jcLE3HNK8bEzhk/edit#heading=h.autrvvehmtj9 Please kindly let me know your thoughts on this. I look forward to hearing from you. Thank you very much :) — EmJ, Mar 09 '20 at 22:31
Thank you very much for the detailed feedback. I found it very useful. I corrected my code based on your feedback. I have few doubts that I would like to verify from you before I use it in the experiment. My edited code is in: https://docs.google.com/document/d/1v6kA2Y8fzTPMVl-JvZgAe-UZHKze2jcLE3HNK8bEzhk/edit#heading=h.autrvvehmtj9 Please kindly let me know if I have done things correct this time. Thank you very much for your support. Looking forward to hearing from you :) — EmJ, Mar 10 '20 at 01:12
Thank you so much for your feedback. I am currently using your code in my dataset. Thank you very much :) — EmJ, Mar 11 '20 at 21:41
I was able to run half of the code. However, I got error in the middle of the code as `"RuntimeError: You must compile a model before training/testing. Use model.compile(optimizer, loss)."`. I have mentioned the place I got the error in the document: https://docs.google.com/document/d/1v6kA2Y8fzTPMVl-JvZgAe-UZHKze2jcLE3HNK8bEzhk/edit#heading=h.p1v7iulqho0l I was thinking whether I need to mention something like this `model.fit(X, y, epochs=50, verbose=0)`. However, since we are using batches I am not sure if it is correct. Please kindly let me know your thoughts on this :) — EmJ, Mar 13 '20 at 00:55
Oh, I get the point now. Do you mean something like this `model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])`? What is the `loss`, `optimizer` and `metrics` that you would recomment to my problem? I look forward to hearing from you. Thank you very much. :) — EmJ, Mar 13 '20 at 22:05
Loss is probably "mse" if you used a "linear" activation at the end, I like the "adam" optimizer, it's usually good. I'm not sure we have good metrics except for "mae". You might have used sigmoid (from 0 to 1 and try "binary_crossentropy"). — Daniel Möller, Mar 14 '20 at 18:21
Thanks a lot. :) I am so sorry that I get one more error in the line `model.train_on_batch(batch_x, batch_y)`. The error is `ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (15, 25, 300)`. The dimensions of my my real dataset got little changed. Now I have details of 2865 machines on 26 days that are 300 length long (i.e. `input_data.shape = (2865, 26, 300)`). Therefore, I set the batch size to `15`. Please kindly let me know your thoughts on how to resolve this error. Looking forward to hearing from you. Thank you very much. — EmJ, Mar 15 '20 at 00:39
The output shape of your model must match the shape of the "target" (batch_y). Your model is currently outputting 2D data `(None, something)`, you made a model that loses the time dimension somewhere. See the model summary — Daniel Möller, Mar 16 '20 at 12:56
Thanks a lot for the comment. Yes you are correct, it seems like the model output (batchsize, day_27_prediction) (i.e. `(15, 300)`). I have mentioned the `model.summary()` results in the edit section of my question. However since we are dealing with batches isn't is correct? I am not so sure what is the problem due to my limited knowledge in deep learning. I am so sorry. However, if you have any idea to resolve this issue please kindly let me know. Thank you very much :) — EmJ, Mar 19 '20 at 04:03
Don't you want to ask a new question? That's a basic "target shape" different from "model output shape" issue. Show your model, show the error, etc. All you need is that your model's output shape matches your `batch_y` shape. — Daniel Möller, Mar 19 '20 at 12:26

score 2 · Answer 2 · answered Jan 30 '20 at 02:56

I think that you are going to a good direction, to increase the time steps in each day, you will need to add a pad in your data, this example can help you: https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py#L46.

However, I would also try another approachs, like fix the number of time steps, for example, 3 days, 4, 5... And then, evaluating your train, you can choose how many time steps is the best for your model.

Maybe your initial approach increasing the number of days will be better, but in this type of problem, find the best number of time steps in a LSTM is very important.