How can I setup a Dense bottleneck in a stacked LSTM with Keras?

Question

I have:

        self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(
            seq_length, feature_dim), return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Bidirectional(
            LSTM(lstm2_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        # BOTTLENECK HERE

        self.model.add(Bidirectional(
            LSTM(lstm3_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Bidirectional(
            LSTM(lstm4_size, return_sequences=True)))
        self.model.add(BatchNormalization())
        self.model.add(Dropout(0.2))

        self.model.add(Dense(feature_dim, activation='linear'))

However, I want to set up an autoencoder-like setup, without having to have 2 separate models. Where I have the comment BOTTLENECK HERE, I want to have a vector of some dimension, say bottleneck_dim.

After that, it should be some LSTM layers that then reconstruct a sequence, of the same dimensions as the initial input. However, I believe that adding a Dense layer will not return one vector, but instead return vectors for each of the sequence-length?

OverLordGoldDragon · Accepted Answer · 2020-02-25T15:33:16.313

2

Dense has been updated to automatically act as if wrapped with TimeDistributed - i.e. you'll get (batch_size, seq_length, lstm2_size).
A workaround is to place a Flatten() before it, so Dense's output shape will be (batch_size, seq_length * lstm2_size). I wouldn't recommend it, however, as it's likely to corrupt temporal information (you're mixing channels and timesteps). Further, it constrains the network to seq_length, so you can no longer do training or inference on any other seq_length.

A preferred alternative is Bidirectional(LSTM(..., return_sequences=False)), which returns only the last timestep's output, shaped (batch_size, lstm_bottleneck_size). To feed its outputs to the next LSTM, you'll need RepeatVector(seq_length) after the =False layer.

Do mind the extent of the "bottleneck", though; e.g. if (seq_length, feature_dim) = (200, 64) and lstm_bottleneck_size = 400, that's (1 * 400) / (200 * 64) = x32 reduction, which is quite large, and may overwhelm the network. I'd suggest with x8 as the goal.

edited Feb 25 '20 at 15:33

answered Feb 25 '20 at 14:24

OverLordGoldDragon

1
9
53
101

So then where I have `BOTTLENECK HERE`, you're saying that my `lstm2` should instead simple do `return_sequences=False`? Will that be fed in easily to the next LSTM layer? – Shamoon Feb 25 '20 at 14:53
Secondly, is there any benefit to compressing to a bottleneck then reconstructing it? Or am I better off just doing straight seq2seq? – Shamoon Feb 25 '20 at 15:13
seq2seq should be easier to train for longer timesteps, as loss contributes to gradient to each timestep directly, though it also depends on your [regularization](https://stackoverflow.com/questions/48714407/rnn-regularization-which-component-to-regularize/58868383#58868383). You can try both though, just keep it a fair comparison with a fixed compression ratio. The idea with autoencoders is to impose some form of a constraint for the encoder to learn robust/useful representations - dimension's one of them; others include denoising (`GaussianNoise`, input dropout), sparsity (`SpatialDropout`). – OverLordGoldDragon Feb 25 '20 at 15:31
Can you clarify the role of `RepeatVector`? Now instead of feeding 1 vector, I'll be feeding `seq_length` of the same vectors? – Shamoon Feb 25 '20 at 16:11
How do I specify the `lstm_bottleneck_size`? – Shamoon Feb 25 '20 at 18:23
@Shamoon Correct @RepeatVector, and `lstm_bottleneck_size` determines the "compression factor", which you tune as a hyperparameter; see example in my answer. – OverLordGoldDragon Feb 26 '20 at 11:10
So where exactly do I specify my bottleneck size? Is it a parameter in the last layer before the RepeatVector? – Shamoon Feb 26 '20 at 11:15
@Shamoon Yes, and previous `size`s should be increasing toward input; I suggest reading further on autoencoders, e.g. [here](https://www.kaggle.com/shivamb/how-autoencoders-work-intro-and-usecases) – OverLordGoldDragon Feb 26 '20 at 11:43

How can I setup a Dense bottleneck in a stacked LSTM with Keras?

1 Answers1