2

I don't have any idea about low accuracy in my configuration (always 0.1508). Data shape : (1476,1000,1)

scaler = MinMaxScaler(feature_range=(0,1))
scaled_X = scaler.fit_transform(train_Data)

....

 myModel = Sequential()

 myModel.add(LSTM(128,input_shape=(myData.shape[1:]),activation='relu',return_sequences=True))
 myModel.add(Dropout(0.2))
 myModel.add(BatchNormalization())

 myModel.add(LSTM(128,activation='relu',return_sequences=True))
 myModel.add(Dropout(0.2))
 myModel.add(BatchNormalization())

 myModel.add(LSTM(64,activation='relu',return_sequences=True))
 myModel.add(Dropout(0.2))
 myModel.add(BatchNormalization())

 myModel.add(LSTM(32,activation='relu'))
 myModel.add(Dropout(0.2))
 myModel.add(BatchNormalization())

 myModel.add(Dense(16,activation='relu'))
 myModel.add(Dropout(0.2))

 myModel.add(Dense(8,activation='softmax'))
 #myModel.add(Dropout(0.2))

 opt = tf.keras.optimizers.SGD(lr=0.001,decay=1e-6)
 ls  = tf.keras.losses.categorical_crossentropy

Also sometimes following warnings:

W1014 21:02:57.125363  6600 ag_logging.py:146] Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x00000188C58C3E18> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: 
WARNING: Entity <function Function._initialize_uninitialized_variables.<locals>.initialize_variables at 0x00000188C58C3E18> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause:
Codie
  • 131
  • 2
  • 12

1 Answers1

3

Two main culprits are: Dropout layers, data preprocessing. In detail & others:

  • Dropout on stacked LSTMs is known to yield poor performance, as it introduces too much noise for stable time-dependency feature extraction. Fix: use recurrent_dropout
  • If you are working with signal data, or otherwise data with (1) outliers; (2) phase information; (3) frequency information - MinMaxScaler will corrupt the latter two plus amplitude information per (1). Fix: use StandardScaler, or QuantileTransformer
  • Consider using Nadam optimizer over SGD; it proved vastly dominant in my LSTM applications, and is generally more hyperparameter-robust than SGD
  • Consider using CuDNNLSTM; it can run 10x faster
  • Ensure your data's shaped properly for LSTM: (batch_size, timesteps, features) - or equivalently, (samples, timesteps, channels)

Note of warning: if you do use recurrent_dropout, use activation='tanh', as 'relu' is unstable.


UPDATE: True culprit: insufficient data. Details here

OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
  • Thanks for that note of warning at the end... Wish I would have seen that in the docs. – Andy Oct 16 '19 at 17:14
  • 1
    @Andy It's quite a problem, yet no response on Github - maybe I'll open an issue on TF instead. I intend on debugging it myself eventually - can sub to the thread to be notified. – OverLordGoldDragon Oct 16 '19 at 17:17
  • 2
    I've been seeing you all over the TF Github, thanks so much for your valuable diagnostic work on these issues – Andy Oct 16 '19 at 17:19
  • @Andy Surprised and glad to hear that you have - you're welcome. – OverLordGoldDragon Oct 16 '19 at 17:20
  • @OverLordGoldDragon Thanks for reply. I tried your methods they were useful but there wasn't any obvious changes in the results. Then I decided to reshape my data and try convolutional layers. Results were the same as LSTM but training is much more faster than LSTM. Please check : https://stackoverflow.com/questions/58433022/keras-over-fitting-conv2d – Codie Oct 17 '19 at 15:37
  • @Codie What is the application? What's the input and output data? (stocks, weather, signals, etc) – OverLordGoldDragon Oct 17 '19 at 15:40
  • Which of above suggestions have you tried (all)? `MinMaxScaler` _will_ be a major problem for signals. Are you able to share a batch or a sample of your data (via e.g. [dropbox](https://www.dropbox.com))? – OverLordGoldDragon Oct 17 '19 at 16:34
  • @Codie This is your _entire dataset_? Then that's the problem: it's not nearly enough for deep learning, especially for data beasts likes of stacked LSTMs. You're better off with a SupportVectorMachine, or at most LSTM + Dense + Dropout – OverLordGoldDragon Oct 18 '19 at 03:37
  • @Codie I can quickly test a minimal DL model if you'd like, just share your validation data – OverLordGoldDragon Oct 18 '19 at 03:39
  • @OverLordGoldDragon this is whole the data that I have. For training, validation and test. I use 0.1 of data for validation through defining validation_split parameter. – Codie Oct 18 '19 at 07:20
  • @OverLordGoldDragon you can find my results here, they are the same as LSTM: [Results](https://stackoverflow.com/questions/58433022/keras-over-fitting-conv2d) – Codie Oct 18 '19 at 08:42
  • @Codie How do you use `Dense(16)` softmax if your label shape is `(8,)`? Output dense units should match # of classes – OverLordGoldDragon Oct 18 '19 at 13:57
  • @OverLordGoldDragon I use dense(8). – Codie Oct 18 '19 at 14:22
  • @Andy Finally [figured it out](https://stackoverflow.com/questions/57516678/lstm-recurrent-dropout-with-relu-yields-nans/59656753#59656753). Frankly I'm surprised noone else did, even on Keras Github; I at least had an excuse of being relatively new to RNNs at the time. – OverLordGoldDragon Jan 09 '20 at 03:22
  • 1
    Thank you for this answer, you were spot on about both `dropout` and `StandardScaler` vs `MinMax` :-D – Ælex May 07 '21 at 14:50