Tensorflow Keras LSTM not training - affected by number_of_epochs, optimizer adam

Question

I have 2 code snippets. One of them trains a model while the other one doesn't. I don't want to raise an issue on Github without getting to the bottom of this and it wasted a day of mine waiting for the incorrect model to train.

This is the model, which is correct. Running tensorflow 1.10.1.

model = Sequential()
# I truncate the string at 20 characters, alphabet listset is a sorted list of the set of [A-Za-z0-9-_] which has len = 64
model.add(LSTM(512, return_sequences=True, input_shape=(20, len(alphabet_listset)), dropout=0.2, stateful=False))
model.add(LSTM(512, return_sequences=False, dropout=0.2, stateful=False))
model.add(Dense(2, activation="softmax"))
model.compile(optimizer=adam, loss='categorical_crossentropy', 
              metrics=['accuracy'])  # adam here is at learning rate 1e-3
model.summary()

To create X_train and Y_train I use test_train_split. The way I convert the string to one hot vector(even though there is a fuction for one hot vector for lstm now, if you add that it would really help) is

def string_vectorizer(strng, alphabet, max_str_len=20):
    vector = [[0 if char != letter else 1 for char in alphabet] for letter in strng[0:max_str_len]]
    while len(vector) != max_str_len:
        vector = [*vector, [0 for char in alphabet]]
    return np.array(vector)

The parts I mention as correct are indeed correct as this is not the first time I am training this model and have validated it. I need to update my models every month and when I was testing my architecture by running multiple models I came across this anomaly.

Here is the incorrect code

    model.fit(X_train, to_categorical(Y_train, 2), epochs=1000,
              validation_data=(X_test, to_categorical(Y_test, 2)),
              verbose=2, shuffle=True)
    loss, accuracy = model.evaluate(X_test, to_categorical(Y_test, 2))

Output of this incorrect snippet is the same as the correct snippet log, just that the accuracy remains at 0.5454 for 12 epochs and the loss does not reduce. My sample data is at a split of 50k correct to 60k incorrect labels. So if the model just predicts 1 for all the 60k incorrect labels, the accuracy would be 60k / (60k + 50k) => 0.54.

Here is the correct code, the only difference is the value of epochs.

expected_acc_eth, expected_loss_eth = 0.83, 0.40

while(True):
    model.fit(X_train, to_categorical(Y_train, 2), epochs=1,
              validation_data=(X_test, to_categorical(Y_test, 2)),\
              verbose=2, shuffle=True)
    loss, accuracy = model.evaluate(X_test, to_categorical(Y_test, 2))

    if((accuracy > expected_acc_eth) & (loss < expected_loss_eth)):
        break

Output of this correct code

Train on 99000 samples, validate on 11000 samples
Epoch 1/1
 - 1414s - loss: 0.6847 - acc: 0.5578 - val_loss: 0.6698 - val_acc: 0.5961
11000/11000 [==============================] - 36s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
 - 1450s - loss: 0.6777 - acc: 0.5764 - val_loss: 0.6707 - val_acc: 0.5886
11000/11000 [==============================] - 36s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
 - 1425s - loss: 0.6729 - acc: 0.5862 - val_loss: 0.6643 - val_acc: 0.6030
11000/11000 [==============================] - 37s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
 - 1403s - loss: 0.6681 - acc: 0.5948 - val_loss: 0.6633 - val_acc: 0.6092
11000/11000 [==============================] - 35s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1

I have seen this stackoverflow post which states that early stopping affects the way models learn but they go off topic with the steps per epoch theory. I tried setting batch_size but that doesn't help or I couldn't do it correctly as it depends inversely to the learning rate of adam and my scale must have been off. I understand deep nets and machine learning to some extent but this is too much of a difference between the outputs.

I hope it saves others who face similar bugs from wasting too much time like me!

Can someone please elaborate on this. Any help is much appreciated!

Out of curiosity, do you see the same issue if you use a different optimizer? I wonder if this is specific to the Adam optimizer, or if it is something in the way that LSTMs are handled by `model.fit` when `epochs > 1`... — Engineero, Sep 26 '18 at 16:53
I have only tried adam and adamw. I have not tried it on RMS prop. Every epoch takes 30mins - 1hr and I have limited compute capacity and time. If anyone has tried it with RMS prop do share. Here is the trained model and weights https://github.com/devssh/GenderEthnicityDetector. Adam is at its core RMS prop + momentum. — devssh, Sep 26 '18 at 16:54
Are you able to try it on a drastically reduced dataset so that it runs fast? I'm familiar with the guts of the Adam optimizer; testing with another optimizer would narrow down where in the Keras/TF pipeline your training may be going wrong. — Engineero, Sep 26 '18 at 17:16
Ya, this problem does not arise if my optimizer="rmsprop". Also rmsprop is between 2x - 10x slower than adam. So this means that the momentum is lost when do fit one by one using adam and that allows the model to train. Hmm.. I wonder if this can be fixed using batch_size in fit and input_batch_size in LSTM, since I'm not using that and adam had a custom learning rate of 1e-3 which rmsprop does not need me to specify. — devssh, Sep 27 '18 at 07:17
If nobody else replies, I think I can accept this as a valid answer by @Engineero as we know what is going wrong. Thanks! — devssh, Sep 27 '18 at 07:24
I'll make it an actual answer just in case it's the best we can come up with :) — Engineero, Sep 27 '18 at 18:18
@devssh Just curious about this. Did you use the optimizer defined in Keras or did you use the tf.train.Optimizer for training your network? — kvish, Sep 27 '18 at 22:17

score 1 · Accepted Answer · answered Sep 27 '18 at 18:20

From our discussion in the comments, it sounds like the issue arises in the implementation of the Adam optimizer failing to update anything when model.fit() is called with epochs > 1.

I would be interested in seeing why this is, but a (slower) working solution for now is to use optimizer=rmsprop instead of optimizer=adam in your call to model.compile().

Tensorflow Keras LSTM not training - affected by number_of_epochs, optimizer adam

1 Answers1