1

I have a built a LSTM architecture using Keras. My goal is to map length 29 time series input sequences of floats to length 29 output sequences of floats. I am trying to implement a "many-to-many" approach. I followed this post for implementing such a model.

I start by reshaping each data point into an np.array of shape `(1, 29, 1). I have multiple data points and train the model on each one separately. The following code is how I build my model:

def build_model():
    # define model
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(29, return_sequences=True, input_shape=(29, 1)))
    model.add(tf.keras.layers.LeakyReLU(alpha=0.3))

    model.compile(optimizer='sgd', loss='mse', metrics = ['mae'])

    #cast data
    for point in train_dict:
        train_data = train_dict[point]

        train_dataset = tf.data.Dataset.from_tensor_slices((
            tf.cast(train_data[0], features_type),
            tf.cast(train_data[1], target_type))
        ).repeat() #cast into X, Y

        # fit model


        model.fit(train_dataset, epochs=100,steps_per_epoch = 1,verbose=0)


        print(model.summary())   
    return model 

I am confused because when I call model.predict(test_point, steps = 1, verbose = 1) the model returns 29 length 29 sequences! I don't understand why this is happening, based on my understanding from the linked post. When I try return_state=True instead of return_sequences=True then my code raises this error: ValueError: All layers in a Sequential model should have a single output tensor. For multi-output layers, use the functional API.

How do I solve the problem?

Emma
  • 27,428
  • 11
  • 44
  • 69
taurus
  • 470
  • 7
  • 22
  • What is the shape of the `test_point`? From what I can see, your model outputs a `[batch_size, 29, 29]` sized output. So there will be two 29s in the output shape (I'm assuming that's what you're getting). – thushv89 Jul 25 '19 at 00:07
  • test_point is also [1,29,1]. That is the ideal shape I would like to output too – taurus Jul 25 '19 at 00:13
  • Then you will need a dense layer on top of your LSTM. I'm assuming this is a regression problem. Therefore I am posting an answer with a linear Dense layer on top of the LSTM. Also I'm not sure why you're applying a LeakyReLU on top of the LSTM. That seems odd to me. – thushv89 Jul 25 '19 at 00:16

1 Answers1

2

Your model has few flaws.

  1. The last layer of your model is an LSTM. Assuming you're doing either classification / regression. This should be followed by a Dense layer (SoftMax/sigmoid - classification, linear - regression). But since this is a time-series problem, dense layer should be wrapped in a TimeDistributed wrapper.

  2. It's odd to apply a LeakyReLU on top of the LSTM.

I've fixed the code with fixes for above issues. See if that helps.

from tensorflow.keras.layers import Embedding, Input, Bidirectional, LSTM, Dense, Concatenate, LeakyReLU, TimeDistributed
from tensorflow.keras.initializers import Constant
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
def build_model():
    # define model
    model = Sequential()
    model.add(LSTM(29, return_sequences=True, input_shape=(29, 1)))
    model.add(TimeDistributed(Dense(1)))
    model.compile(optimizer='sgd', loss='mse', metrics = ['mae'])


    print(model.summary())   
    return model 

model = build_model()
thushv89
  • 10,865
  • 1
  • 26
  • 39
  • That worked, thanks! I am using Leaky ReLu because my LSTM was often just predicting 0 across all time steps. I read that a leaky ReLu could fix this if the issue arose from dying ReLu and found good results. Why do you find it odd? – taurus Jul 25 '19 at 13:46
  • https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks – taurus Jul 25 '19 at 13:46
  • Also to follow up, my `model.summary()` reports the before the `TimeDistributed` layer, the LSTM outputs a `(None, 29, 29)` tensor. I still don't understand why this isn't `(None, 29, 1)`. – taurus Jul 25 '19 at 15:33
  • Hi @taurus, The output of the LSTM is (None, 29, 29) because your LSTM's latent size (i.e. hidden size is 29). If you really want the LSTM output to be (None, 29, 1), the LSTM layer should be `LSTM(1, …)`. – thushv89 Jul 26 '19 at 00:17
  • About your question on ReLUs, your argument is partially correct. Yes LeakyReLU does help as a non-linearity function to alleviate "neurons dying". But LSTMs already have a non-linearity in the cell (e.g. tanh / sigmoid) and you're applying LeakyReLU on top of those non-linearities which is not a standard thing to do. – thushv89 Jul 26 '19 at 00:19