Keras Masking for RNN with Varying Time Steps

Question

I'm trying to fit an RNN in Keras using sequences that have varying time lengths. My data is in a Numpy array with format (sample, time, feature) = (20631, max_time, 24) where max_time is determined at run-time as the number of time steps available for the sample with the most time stamps. I've padded the beginning of each time series with 0, except for the longest one, obviously.

I've initially defined my model like so...

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(max_time, 24)))
model.add(LSTM(100, input_dim=24))
model.add(Dense(2))
model.add(Activation(activate))
model.compile(loss=weibull_loglik_discrete, optimizer=RMSprop(lr=.01))
model.fit(train_x, train_y, nb_epoch=100, batch_size=1000, verbose=2, validation_data=(test_x, test_y))

For completeness, here's the code for the loss function:

def weibull_loglik_discrete(y_true, ab_pred, name=None):
    y_ = y_true[:, 0]
    u_ = y_true[:, 1]
    a_ = ab_pred[:, 0]
    b_ = ab_pred[:, 1]

    hazard0 = k.pow((y_ + 1e-35) / a_, b_)
    hazard1 = k.pow((y_ + 1) / a_, b_)

    return -1 * k.mean(u_ * k.log(k.exp(hazard1 - hazard0) - 1.0) - hazard1)

And here's the code for the custom activation function:

def activate(ab):
    a = k.exp(ab[:, 0])
    b = k.softplus(ab[:, 1])

    a = k.reshape(a, (k.shape(a)[0], 1))
    b = k.reshape(b, (k.shape(b)[0], 1))

    return k.concatenate((a, b), axis=1)

When I fit the model and make some test predictions, every sample in the test set gets exactly the same prediction, which seems fishy.

Things get better if I remove the masking layer, which makes me think there's something wrong with the masking layer, but as far as I can tell, I've followed the documentation exactly.

Is there something mis-specified with the masking layer? Am I missing something else?

I have a few comments: 1. why have you set a `1e-35` constant when `float32` accuracy is actually `1e-7`? — Marcin Możejko, Apr 06 '17 at 15:51
In terms of my bounty, I really just want an example of using the masking layer properly for sequences of different lengths. Don't worry about network specifics. — Seanny123, Apr 06 '17 at 23:21
The `1e-35` comes from here: https://ragulpr.github.io/assets/draft_master_thesis_martinsson_egil_wtte_rnn_2016.pdf , p. 53. It's just to avoid "numerical instability" (as zeroes are undefined here). Think it should be higher? — DHW, Oct 29 '19 at 15:18

score 7 · Answer 1 · edited Feb 26 '19 at 15:32

The way you implemented masking should be correct. If you have data with the shape (samples, timesteps, features), and you want to mask timesteps lacking data with a zero mask of the same size as the features argument, then you add Masking(mask_value=0., input_shape=(timesteps, features)). See here: keras.io/layers/core/#masking

Your model could potentially be too simple, and/or your number of epochs could be insufficient for the model to differentiate between all of your classes. Try this model:

model = Sequential()
model.add(Masking(mask_value=0., input_shape=(max_time, 24)))
model.add(LSTM(256, input_dim=24))
model.add(Dense(1024))
model.add(Dense(2))
model.add(Activation(activate))
model.compile(loss=weibull_loglik_discrete, optimizer=RMSprop(lr=.01))
model.fit(train_x, train_y, nb_epoch=100, batch_size=1000, verbose=2, validation_data=(test_x, test_y))

If that does not work, try doubling the epochs a few times (e.g. 200, 400) and see if that improves the results.

score 6 · Accepted Answer · answered Apr 06 '17 at 22:34

6

I could not validate without actual data, but I had a similar experience with an RNN. In my case normalization solved the issue. Add a normalization layer to your model.

answered Apr 06 '17 at 22:34

vagoston

171
8

I apologize for being AWOL on this and I appreciate everybody's input. This was indeed the issue - I had skipped normalization while trying to hack together a minimum viable example, but that was a fatal error. For those interested, the resulting model (different data, though) is here: https://github.com/daynebatten/keras-wtte-rnn – John Chrysostom Apr 07 '17 at 18:12

Keras Masking for RNN with Varying Time Steps

2 Answers2

Linked