4

I have a dataset of size (5358,293,30) and I want to train a LSTM network to predict a value between 0 and 1.

My neural network is defined as follow:

model = Sequential()
model.add(LSTM(10, input_shape=(293, 30)))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="mean_squared_error", optimizer="adam")
model.fit(Xtrain, Ytrain, epochs=20, batch_size=38, shuffle=False)

The loss value for all the epochs during the train is ~0.04. When I test the neural network on the test data, I get always the same output as result, ~0.80. I tried a bigger network too, but the output didn't change.

I used default parameters and I scaled the data in range [0,1].

What are the possible causes for this problem? And how can I fix it?

UPDATE: The ouput of the model.summary() for the simplified version:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 10)                1640      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 11        
=================================================================
Total params: 1,651
Trainable params: 1,651
Non-trainable params: 0
_________________________________________________________________

And for the full version:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_2 (LSTM)                (None, 293, 64)           24320     
_________________________________________________________________
lstm_3 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 65        
=================================================================
Total params: 61,569
Trainable params: 61,569
Non-trainable params: 0
_________________________________________________________________
maccN
  • 189
  • 1
  • 4
  • 13
  • Why do you think you need LSTM at first place? – prosti Jan 10 '19 at 16:06
  • Because every sequence contains data extracted from videos. So I have 5358 videos, from each video I extracted 293 frames and from each frame I extracted 30 features – maccN Jan 10 '19 at 16:51
  • OK. And why you choose to have just 10 units inside LSTM? What about increasing that number? – prosti Jan 10 '19 at 17:26
  • I tried a bigger network, one single LSTM layer with 64 neurons and another version with 2 LSTM layers with 64 neurons stacked but I had always the same result. – maccN Jan 10 '19 at 17:29
  • You may provide the `model.summary()` inside your question. – prosti Jan 10 '19 at 18:00
  • Ok, I update the question. You can find the summary of the first model I tried to train and the simplified version – maccN Jan 10 '19 at 18:12
  • Your model has too few parameters in the dense layer to learn the function. What do you have as the output from the LSTM? I guess you are interested to increase the percentage of accuracy to more than 95%. Right? – prosti Jan 11 '19 at 05:47
  • In the original dataset the labels are categorical, there are 4 labels: 0, 1, 2 and 3 (they represent a low to high intensity). I scaled the labels in range [0,1] because I have to compare my network with another one that uses this range. The output of the LSTM is always a value of ~0.80 and the accuracy (measured assigning the value of the the closest label) is ~50%. Do you refer to the dense layer of the first model or the second model? – maccN Jan 11 '19 at 08:05
  • When you wrote "I get always the same output as result, ~0.80" I understood that your accuracy is 80%. Can you confirm that your accuracy is actually 50%? – prosti Jan 11 '19 at 08:17
  • "Do you refer to the dense layer of the first model or the second model?" - I was considering the simpler model. – prosti Jan 11 '19 at 08:18
  • Yes, accuracy is 50%. The 0.80 is the output value of the last layer. I tried to train the model with more neurons in the LSTM layer but nothing changed. The behavior of the two models (simplified and full) is the same. I start to think there's something wrong in the dataset – maccN Jan 11 '19 at 08:52
  • 1
    I would suggest to check [this](https://stackoverflow.com/questions/38714959/understanding-keras-lstms?rq=1) and confirm your LSTM type first and then to check if you encoded your data before fitting the LSTM, but this is another question perhaps. – prosti Jan 11 '19 at 08:59
  • Thank you for the link provided. I will read it as soon as possible. – maccN Jan 11 '19 at 09:06

2 Answers2

2

If we assume your model is OK, than the first thing you may try is to increase the number of epochs.

epochs=20

Also play with the optimizer. For instance, you choose the Adam optimizer, make sure you test different parameters:

opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)

You may add the model.summary() for better anticipation of your model. I think providing the model summary is the very first thing to understand the system.

Since you mentioned features, it is very important to note how do you represent them. Based on the features representation you may need to modify the LSTM model.

prosti
  • 42,291
  • 14
  • 186
  • 151
1
  1. You can keep track such weird error, by using cross-validation.
  2. Shuffling data help in better generalization, try it
  3. Data Preparation can be another reason, have you consider shortening the sequence length, because you are using a quite large sequence.
Ankish Bansal
  • 1,827
  • 3
  • 15
  • 25
  • I tried with shuffling and I reduced the timesteps to 59 but nothing changed. I really can't understand what is wrong – maccN Jan 10 '19 at 14:34