I have 2 code snippets. One of them trains a model while the other one doesn't. I don't want to raise an issue on Github without getting to the bottom of this and it wasted a day of mine waiting for the incorrect model to train.
This is the model, which is correct. Running tensorflow 1.10.1.
model = Sequential()
# I truncate the string at 20 characters, alphabet listset is a sorted list of the set of [A-Za-z0-9-_] which has len = 64
model.add(LSTM(512, return_sequences=True, input_shape=(20, len(alphabet_listset)), dropout=0.2, stateful=False))
model.add(LSTM(512, return_sequences=False, dropout=0.2, stateful=False))
model.add(Dense(2, activation="softmax"))
model.compile(optimizer=adam, loss='categorical_crossentropy',
metrics=['accuracy']) # adam here is at learning rate 1e-3
model.summary()
To create X_train and Y_train I use test_train_split
.
The way I convert the string to one hot vector(even though there is a fuction for one hot vector for lstm now, if you add that it would really help) is
def string_vectorizer(strng, alphabet, max_str_len=20):
vector = [[0 if char != letter else 1 for char in alphabet] for letter in strng[0:max_str_len]]
while len(vector) != max_str_len:
vector = [*vector, [0 for char in alphabet]]
return np.array(vector)
The parts I mention as correct are indeed correct as this is not the first time I am training this model and have validated it. I need to update my models every month and when I was testing my architecture by running multiple models I came across this anomaly.
Here is the incorrect code
model.fit(X_train, to_categorical(Y_train, 2), epochs=1000,
validation_data=(X_test, to_categorical(Y_test, 2)),
verbose=2, shuffle=True)
loss, accuracy = model.evaluate(X_test, to_categorical(Y_test, 2))
Output of this incorrect snippet is the same as the correct snippet log, just that the accuracy remains at 0.5454 for 12 epochs and the loss does not reduce. My sample data is at a split of 50k correct to 60k incorrect labels. So if the model just predicts 1 for all the 60k incorrect labels, the accuracy would be 60k / (60k + 50k) => 0.54
.
Here is the correct code, the only difference is the value of epochs
.
expected_acc_eth, expected_loss_eth = 0.83, 0.40
while(True):
model.fit(X_train, to_categorical(Y_train, 2), epochs=1,
validation_data=(X_test, to_categorical(Y_test, 2)),\
verbose=2, shuffle=True)
loss, accuracy = model.evaluate(X_test, to_categorical(Y_test, 2))
if((accuracy > expected_acc_eth) & (loss < expected_loss_eth)):
break
Output of this correct code
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
- 1414s - loss: 0.6847 - acc: 0.5578 - val_loss: 0.6698 - val_acc: 0.5961
11000/11000 [==============================] - 36s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
- 1450s - loss: 0.6777 - acc: 0.5764 - val_loss: 0.6707 - val_acc: 0.5886
11000/11000 [==============================] - 36s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
- 1425s - loss: 0.6729 - acc: 0.5862 - val_loss: 0.6643 - val_acc: 0.6030
11000/11000 [==============================] - 37s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
- 1403s - loss: 0.6681 - acc: 0.5948 - val_loss: 0.6633 - val_acc: 0.6092
11000/11000 [==============================] - 35s 3ms/step
Train on 99000 samples, validate on 11000 samples
Epoch 1/1
I have seen this stackoverflow post which states that early stopping affects the way models learn but they go off topic with the steps per epoch theory. I tried setting batch_size
but that doesn't help or I couldn't do it correctly as it depends inversely to the learning rate of adam and my scale must have been off. I understand deep nets and machine learning to some extent but this is too much of a difference between the outputs.
I hope it saves others who face similar bugs from wasting too much time like me!
Can someone please elaborate on this. Any help is much appreciated!