LSTM Variational Autoencoder with Masking for variable length sequences

Question

I am trying to implement a LSTM VAE (following this example I found), but also have it accept variable length sequences using Masking Layers. I tried to combine the above code with the ideas from this SO question that seems to deal with it the "best way" by cropping the gradients to get the most accurate loss as possible, however my implementation does not seem to be able to reproduce sequences on a small set of data. I am thus relatively confident that there is something amiss with my implementation, but I cannot seem to pinpoint what exactly is wrong. The relevant part is here:

x = Input(shape=(None, input_dim))(x)
x_masked = Masking(mask_value=0.0, input_shape=(None, input_dim))(x)

h = LSTM(intermediate_dim)(x_masked)

z_mean = Dense(latent_dim)(h)
z_log_sigma = Dense(latent_dim)(h)

def sampling(args):
    z_mean, z_log_sigma = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0., stddev=epsilon_std)
    return z_mean + z_log_sigma * epsilon

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_sigma])

decoded_h = LSTM(intermediate_dim, return_sequences=True)
decoded_mean = LSTM(latent_dim, return_sequences=True)

h_decoded = RepeatVector(max_timesteps)(z)
h_decoded = decoder_h(h_decoded)

x_decoded_mean = decoder_mean(h_decoded)

def crop_outputs(x):
    padding = K.cast(K.not_equal(x[1], 0), dtype=K.floatx())
    return x[0] * padding

x_decoded_mean = Lambda(crop_outputs, output_shape=(max_timesteps, input_dim))([x_decoded_mean, x])

vae = Model(x, x_decoded_mean)

def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.mse(x, x_decoded_mean)
    kl_loss = -0.5 * K.mean(1 + z_log_sigma - K.square(z_mean) - K.exp(z_log_sigma))
    loss = xent_loss + kl_loss
    return loss

vae.compile(optimizer='adam', loss=vae_loss)

# Here, X is variable length time series data of shape
# (num_examples, max_timesteps, input_dim) and is zero padded
# on the right for all the examples of length less than max_timesteps
# X has been appropriately scaled using the StandardScaler.

vae.fit(X, X, epochs = num_epochs, batch_size=batch_size)

As always, any help is much appreciated. Thank you!

score 0 · Answer 1 · answered Jul 28 '20 at 18:09

I came by your question while looking to do exactly the same. I gave up on VAE's, but found a solution to apply masking to layers that don't support masking. What I did was just predefine a binary mask (you can do this with numpy Code 1) and then multiplied my output by the mask. During Backpropagation the algorithm will try the derivative of the multiplication and will end up propagating the value or not. It is not as clever as the masking layer on Keras, bubt it did for me.

#Code 1
#making a numpy binary mask
# expecting a sequence with shape (Time_Steps, Features)
# let's say that my sequence has Features = 10 and a max_Length of 15
max_Len = 15
seq = np.linspace(0,1,100).reshape((10,10))

# You must pad/truncate the sequence here
mask = np.concatenate([np.ones(seq.shape[0]),np.zeros(max_Len-seq.shape[0])],axis=-1)

# This mask can be thrown as input to the model afterwards

A few considerations: 1- It resulted on a weak regression model. Don't know the impact on VAE's, since I never tested, but I think it will generate lots of noise. 2- The computational resource demand went up, so it is a good thing to try and calculate the requirements of propagating and backpropagating this workaround (or "gambiarra" as we say here) if you are on a budget like me. 3- It wont solve the problem completly, you could try and delve deeper on this and implement a more stable solution using pure Tensorflow. 4- A more "accurate" solution would be to implement a custom masking layer (code 2).

Regarding point 4, it is easy, you must define the layer as a default layer and then just use the call function receiving a mask and then just output the multiplication of mask and input. Like this:

# Code 2
class MyCoolMaskingLayer(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        #init stuff here
        pass
    def compute_mask(self, inputs, mask=None):
        return mask
    def call(self, input, mask=None):
        bc_mask = tf.expand_dims(tf.cast(mask, "float32"), -1) if mask is not None else np.asarray([[1]])
        return input * mask

This function might not work for you, it is really problem specific and from a noob (me), but it worked for me. I just cannot share the entire code because my Master's Tutor doesn't allow. (a little bit of context: I wrap it around a TimeDistributed so that each TimeStep of a LSTM output is individually processed by this masking layer, because inside call i perform some transformations on the data)

LSTM Variational Autoencoder with Masking for variable length sequences

1 Answers1