I have 2 Keras models - GRU and LSTM - which I run on a Jupyter Notebook. Both have the same implementation other than what layer they use, of course - LSTM vs GRU. Here is my code:
# 1st Model - GRU
if run_gru:
model_gru = Sequential()
model_gru.add(CuDNNGRU(75, return_sequences=True, input_shape=(i1,i2)))
model_gru.add(CuDNNGRU(units=30, return_sequences=True))
model_gru.add(CuDNNGRU(units=30))
model_gru.add(Dense(units=1, activation="sigmoid"))
model_gru.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # optimizer="adam"
history_gru = model_gru.fit(x, y, epochs = 200, batch_size = 64, validation_data = (x2, y2), shuffle = False, callbacks = [EarlyStopping(patience=100, restore_best_weights=True)])
# 2nd Model - LSTM
if run_lstm:
model_lstm = Sequential()
model_lstm.add(CuDNNLSTM(75, return_sequences=True, input_shape=(i1,i2)))
model_lstm.add(CuDNNLSTM(units=30, return_sequences=True))
model_lstm.add(CuDNNLSTM(units=30))
model_lstm.add(Dense(units=1, activation="sigmoid"))
model_lstm.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # optimizer="adam"
history_lstm = model_lstm.fit(x, y, epochs = 200, batch_size = 64, validation_data = (x2, y2), shuffle = False, callbacks = [EarlyStopping(patience=100, restore_best_weights=True)])
Here are my results when I run each model separately (i.e. restarting the kernel after each run):
run_gru = True; run_lstm = False
-> GRU's val_acc = 58.13953%run_gru = False; run_lstm = True
-> LSTM's val_acc = 51.16279%
However, if I run LSTM immediately after GRU during the same kernel run (i.e. run both without restarting), my results are now as follows:
run_gru = True; run_lstm = True
-> GRU's val_acc = 58.13953% (same as before) but LSTM's val_acc = 79.06977% (way better)
I am wondering if anyone has a guess on why the 2nd model (LSTM) now has way better accuracy, even though both are separate models.
I suspect that the 2nd model is stealing results from the 1st, so I went to check the loss for both models at epoch 1. I found that for each model, the loss for epoch 1 are the same, which implies that LSTM isn't stealing the weights/results from the 1st model (GRU). Also, the summary of the 2nd model indicates that it is a brand new model (i.e. starts from layer 1, not layer 4). I tried setting restore_best_weights
to False
but it still results in the huge difference for model 2.
I understand that I can run each separately, but I would like to run these together to perform further analysis after the models are trained. Also, I could just leave things as it is and run LSTM immediately after GRU, and just use LSTM to predict results, but it seems like I might be missing something really obvious which led to these different results. My thanks in advance!