6

I am trying to build an LSTM Autoencoder to predict Time Series data. Since I am new to Python I have mistakes in the decoding part. I tried to build it up like here and Keras. I could not understand the difference between the given examples at all. The code that I have right now looks like:

Question 1: is how to choose the batch_size and input_dimension when each sample has 2000 values?

Question 2: How to get this LSTM Autoencoder working (the model and the prediction) ? This ist just the model, but how to predict? That it is predicting from the lets say starting from sample 10 on till the end of the data?

Mydata has in total 1500 samples, I would go with 10 time steps (or more if better), and each sample has 2000 Values. If you need more information I would include them as well later.

trainX = np.reshape(data, (1500, 10,2000))

from keras.layers import *
from keras.models import Model
from keras.layers import Input, LSTM, RepeatVector

parameter

timesteps=10
input_dim=2000
units=100 #choosen unit number randomly
batch_size=2000 
epochs=20

Model

inpE = Input((timesteps,input_dim)) 
outE = LSTM(units = units, return_sequences=False)(inpE)
encoder = Model(inpE,outE) 
inpD = RepeatVector(timesteps)(outE)
outD1 = LSTM(input_dim, return_sequences=True)(outD
decoder = Model(inpD,outD) 
autoencoder = Model(inpE, outD)
autoencoder.compile(loss='mean_squared_error',
          optimizer='rmsprop',
          metrics=['accuracy'])
autoencoder.fit(trainX, trainX,
      batch_size=batch_size,
      epochs=epochs)
encoderPredictions = encoder.predict(trainX)
halfer
  • 19,824
  • 17
  • 99
  • 186
annstudent93
  • 131
  • 2
  • 10
  • you will found `batch_size` by trial and error, but it should be less than 1500. – Juan Apr 20 '18 at 16:29
  • @Juan thank you for your advice, and whats with the implementation of the model can you help me there also please ? – annstudent93 Apr 20 '18 at 16:32
  • look at https://stackoverflow.com/questions/44647258/lstm-autoencoder?noredirect=1&lq=1 – Juan Apr 20 '18 at 16:32
  • @Juan i referenced to that question as well, but I do not understand the difference I have a RepeatVector and he is doing it different, but still could not program both versions right – annstudent93 Apr 20 '18 at 16:35
  • I'll change the loss for `rmsprop` and the optimizer for `mse` because it is an autoencoder – Juan Apr 20 '18 at 16:36
  • also, you will need to change first `return_sequences=True` for `return_sequences=False` – Juan Apr 20 '18 at 16:39
  • Please read [Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers?](//meta.stackoverflow.com/q/326569) - the summary is that this is not an ideal way to address volunteers, and is probably counterproductive to obtaining answers. Please refrain from adding this to your questions. – halfer Apr 20 '18 at 20:07

1 Answers1

7

The LSTM model that I use is this one:

def get_model(n_dimensions):
    inputs = Input(shape=(timesteps, input_dim))
    encoded = LSTM(n_dimensions, return_sequences=False, name="encoder")(inputs)
    decoded = RepeatVector(timesteps)(encoded)
    decoded = LSTM(input_dim, return_sequences=True, name='decoder')(decoded)

    autoencoder = Model(inputs, decoded)
    encoder = Model(inputs, encoded)
    return autoencoder, encoder

autoencoder, encoder = get_model(n_dimensions)
autoencoder.compile(optimizer='rmsprop', loss='mse', 
                    metrics=['acc', 'cosine_proximity'])

history = autoencoder.fit(x, x, batch_size=100, epochs=100)
encoded = encoder.predict(x)

It works with the data that have, x is of size (3000, 180, 40), that is 3000 samples, timesteps=180 and input_dim=40.

Juan
  • 1,520
  • 2
  • 19
  • 31
  • above I also changed your suggestions and I will try your model as well. Thank you for your help. I will return here again if I have any new questions ore success to share with. – annstudent93 Apr 20 '18 at 16:55
  • model is working, I will fine tune the parameters thank you. But since I am not familiar with the topic at all why do you define encoder_layer as 2 and not define something for the decoding_layer ? – annstudent93 Apr 20 '18 at 17:05
  • I forgot to delete that variable, it was not used. – Juan Apr 20 '18 at 17:09
  • Okay good that I mentionend it, and is it necessary to define in the function also the decoder Model, to return ? I mean it should be also there right ? – annstudent93 Apr 20 '18 at 17:12
  • Can you show how the data is prepared for the mode.? – Sreeram TP Sep 01 '18 at 19:45
  • 1
    I am trying to build a Auto Enocder Decoder model for time series forecastig purpose. I am stuck on the data preperation part of the model. @Juan If you can guide me with some resources for this it will be very helpful – Sreeram TP Sep 01 '18 at 19:56
  • check https://github.com/primatelang/mcr/blob/d50c2044001bbedf1cfd8255f948fb3a3b013f6a/mcr/reduce_features.py#L117 – Juan Sep 03 '18 at 06:50