LSTM Autoencoder

Question

I'm trying to build a LSTM autoencoder with the goal of getting a fixed sized vector from a sequence, which represents the sequence as good as possible. This autoencoder consists of two parts:

LSTM Encoder: Takes a sequence and returns an output vector (return_sequences = False)
LSTM Decoder: Takes an output vector and returns a sequence (return_sequences = True)

So, in the end, the encoder is a many to one LSTM and the decoder is a one to many LSTM.

Image source: Andrej Karpathy

On a high level the coding looks like this (similar as described here):

encoder = Model(...)
decoder = Model(...)

autoencoder = Model(encoder.inputs, decoder(encoder(encoder.inputs)))

autoencoder.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

autoencoder.fit(data, data,
          batch_size=100,
          epochs=1500)

The shape (number of training examples, sequence length, input dimension) of the data array is (1200, 10, 5) and looks like this:

array([[[1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 0, 0],
        ..., 
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]],
        ... ]

Problem: I am not sure how to proceed, especially how to integrate LSTM to Model and how to get the decoder to generate a sequence from a vector.

I am using keras with tensorflow backend.

EDIT: If someone wants to try out, here is my procedure to generate random sequences with moving ones (including padding):

import random
import math

def getNotSoRandomList(x):
    rlen = 8
    rlist = [0 for x in range(rlen)]
    if x <= 7:
        rlist[x] = 1
    return rlist


sequence = [[getNotSoRandomList(x) for x in range(round(random.uniform(0, 10)))] for y in range(5000)]

### Padding afterwards

from keras.preprocessing import sequence as seq

data = seq.pad_sequences(
    sequences = sequence,
    padding='post',
    maxlen=None,
    truncating='post',
    value=0.
)

@lelloman: Just for test purposes. I hope for testing this is enough. I think it should work anyway since this task is about reconstructing and not really about finding patterns. — ScientiaEtVeritas, Jun 20 '17 at 10:49
i'm asking just out of curiosity, i'm really no expert. but shouldn't autoencoders need patterns in order to work? — lelloman, Jun 20 '17 at 10:51
@lelloman: I think you're right in the end. But I would also guess, if the problem is not too big and you're output vector has a high enough dimension that you can encode random sequences without a high loss of information. But it could be that I'm wrong. Hopefully we will see what's the case. And my real data is of course not random. — ScientiaEtVeritas, Jun 20 '17 at 10:58
did you take a look at [this](https://github.com/fchollet/keras/issues/1401)? — lelloman, Jun 20 '17 at 11:10
@lelloman: Yeah, see at the end of that issue. They tried with ``return_sequences = True`` for the encoder which results in no meaningful output vector since the data is reconstructed using the internal sequence. — ScientiaEtVeritas, Jun 20 '17 at 11:14
You can use a `one-to-many` approach from here: https://stackoverflow.com/questions/43034960/many-to-one-and-many-to-many-lstm-examples-in-keras/43047615#43047615 — Marcin Możejko, Jun 20 '17 at 11:50

Daniel Möller · Accepted Answer · 2017-08-29T12:27:34.153

Models can be any way you want. If I understood it right, you just want to know how to create models with LSTM?

Using LSTMs

Well, first, you have to define what your encoded vector looks like. Suppose you want it to be an array of 20 elements, a 1-dimension vector. So, shape (None,20). The size of it is up to you, and there is no clear rule to know the ideal one.

And your input must be three-dimensional, such as your (1200,10,5). In keras summaries and error messages, it will be shown as (None,10,5), as "None" represents the batch size, which can vary each time you train/predict.

There are many ways to do this, but, suppose you want only one LSTM layer:

from keras.layers import *
from keras.models import Model

inpE = Input((10,5)) #here, you don't define the batch size   
outE = LSTM(units = 20, return_sequences=False, ...optional parameters...)(inpE)

This is enough for a very very simple encoder resulting in an array with 20 elements (but you can stack more layers if you want). Let's create the model:

encoder = Model(inpE,outE)

Now, for the decoder, it gets obscure. You don't have an actual sequence anymore, but a static meaningful vector. You may want to use LTSMs still, they will suppose the vector is a sequence.

But here, since the input has shape (None,20), you must first reshape it to some 3-dimensional array in order to attach an LSTM layer next.

The way you will reshape it is entirely up to you. 20 steps of 1 element? 1 step of 20 elements? 10 steps of 2 elements? Who knows?

inpD = Input((20,))   
outD = Reshape((10,2))(inpD) #supposing 10 steps of 2 elements

It's important to notice that if you don't have 10 steps anymore, you won't be able to just enable "return_sequences" and have the output you want. You'll have to work a little. Acually, it's not necessary to use "return_sequences" or even to use LSTMs, but you may do that.

Since in my reshape I have 10 timesteps (intentionally), it will be ok to use "return_sequences", because the result will have 10 timesteps (as the initial input)

outD1 = LSTM(5,return_sequences=True,...optional parameters...)(outD)    
#5 cells because we want a (None,10,5) vector.

You could work in many other ways, such as simply creating a 50 cell LSTM without returning sequences and then reshaping the result:

alternativeOut = LSTM(50,return_sequences=False,...)(outD)    
alternativeOut = Reshape((10,5))(alternativeOut)

And our model goes:

decoder = Model(inpD,outD1)  
alternativeDecoder = Model(inpD,alternativeOut)

After that, you unite the models with your code and train the autoencoder. All three models will have the same weights, so you can make the encoder bring results just by using its predict method.

encoderPredictions = encoder.predict(data)

What I often see about LSTMs for generating sequences is something like predicting the next element.

You take just a few elements of the sequence and try to find the next element. And you take another segment one step forward and so on. This may be helpful in generating sequences.

Thanks :) It's very helpful. You defined the variables ``inp`` and ``out`` two times which is confusing (and will lead to an error if I copy and paste your code). But I figured it out. Also it seems like ``Reshape((10,2))`` expect ``out`` as parameter. Anyway, I tested your idea on a less random sequence (a moving 1, like 1 0 0 -> 0 1 0 -> 0 0 1). The generated sequences look like this ``[ 7.61515856e-01, 0.00000000e+00, 0.00000000e+00, -7.51162320e-02, -8.43070745e-02, ...]`` and has a loss about 0.08 to 0.13. If you have any ideas how to further improve this, I would love to know :) — ScientiaEtVeritas, Jun 23 '17 at 10:20
Sorry for the little mistake. The Reshape((10,2)) expects `inpD`, as I corrected. -- If you're training for many epochs, and your results don't reach what you expect, maybe you need more layers. It's possible that this model is not "intelligent" enough for the task. Keep in mind that the standard "activation function" for LSTMs is `tanh`, which means the results will be between -1 and 1. If you need outputs between 0 and 1, you should change the activation to `sigmoid`, for instance. — Daniel Möller, Jun 23 '17 at 13:50

score 7 · Answer 2 · answered Jul 19 '17 at 15:31

7

You can find a simple of sequence to sequence autoencoder here: https://blog.keras.io/building-autoencoders-in-keras.html

answered Jul 19 '17 at 15:31

user6903745

5,267
3
19
38

What I realized is that real seq2seq autoencoder don't use ``RepeatVector`` or ``Reshape``, but redirect the output from unit to unit. Take a look at this image: http://suriyadeepan.github.io/img/seq2seq/seq2seq1.png – ScientiaEtVeritas Jul 20 '17 at 08:41

score 2 · Answer 3 · answered Jan 03 '20 at 09:58

Here is an example

Let's create a synthetic data consisting of a few sequence. The idea is looking into these sequences through the lens of an autoencoder. In other words, lowering the dimension or summarizing them into a fixed length.

# define input sequence
sequence = np.array([[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 
                     [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
                     [0.2, 0.4, 0.6, 0.8],
                     [0.3, 0.6, 0.9, 1.2]])

# prepare to normalize
x = pd.DataFrame(sequence.tolist()).T.values
scaler = preprocessing.StandardScaler()
x_scaled = scaler.fit_transform(x)
sequence_normalized = [col[~np.isnan(col)] for col in  x_scaled.T]

# make sure to use dtype='float32' in padding otherwise with floating points
sequence = pad_sequences(sequence, padding='post', dtype='float32')

# reshape input into [samples, timesteps, features]
n_obs = len(sequence)
n_in = 9
sequence = sequence.reshape((n_obs, n_in, 1))

Let's device a simple LSTM

#define encoder
visible = Input(shape=(n_in, 1))
encoder = LSTM(2, activation='relu')(visible)

# define reconstruct decoder
decoder1 = RepeatVector(n_in)(encoder)
decoder1 = LSTM(100, activation='relu', return_sequences=True)(decoder1)
decoder1 = TimeDistributed(Dense(1))(decoder1)

# tie it together
myModel = Model(inputs=visible, outputs=decoder1)

# summarize layers
print(myModel.summary())


#sequence = tmp
myModel.compile(optimizer='adam', loss='mse')

history = myModel.fit(sequence, sequence, 
                      epochs=400, 
                      verbose=0, 
                      validation_split=0.1, 
                      shuffle=True)

plot_model(myModel, show_shapes=True, to_file='reconstruct_lstm_autoencoder.png')
# demonstrate recreation
yhat = myModel.predict(sequence, verbose=0)
# yhat

import matplotlib.pyplot as plt

#plot our loss 
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model train vs validation loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

Lets build the autoencoder

# use our encoded layer to encode the training input
decoder_layer = myModel.layers[1]

encoded_input = Input(shape=(9, 1))
decoder = Model(encoded_input, decoder_layer(encoded_input))

# we are interested in seeing how the encoded sequences with lenght 2 (same as the dimension of the encoder looks like)
out = decoder.predict(sequence)

f = plt.figure()
myx = out[:,0]
myy = out[:,1]
s = plt.scatter(myx, myy)

for i, txt in enumerate(out[:,0]):
    plt.annotate(i+1, (myx[i], myy[i]))

And here is the representation of the sequences

Your solution does not try to mask the padding you've done correct? Wont the padding interfere with training the AE? I'm asking this because I have a time-series data where some sequences have length 20, others have length 1 so I pad them so that all have length 20 and pass them through a masking layer to ignore the padding. In the case of AE's I dont know how to proceed with masking the padding. — lsfischer, Jan 15 '20 at 12:30
@LFisher - you are right ! could this be helpful https://stackoverflow.com/questions/49670832/keras-lstm-with-masking-layer-for-variable-length-inputs I need to look and update my answer perhpas — Areza, Jan 15 '20 at 18:57
Your link seems like it has a good fix for the issue! I got around it in a more hacky way by manually cropping the output in the padded areas, but apparently there's an easier way, I'll definitely take a look, thanks! — lsfischer, Jan 16 '20 at 11:36
@Isfischer - I also found the answer https://stackoverflow.com/questions/44131718/padding-time-series-subsequences-for-lstm-rnn-training insightful. Seems the author claims no padding is needed. Please feel free to update my answer and your experience with your own data here. — Areza, Jan 20 '20 at 14:25

LSTM Autoencoder

3 Answers3

Linked