Has anyone ever tried to train a Pytorch LSTM model, save it, reload it somewhere else and then continue training? I've been trying to do something like this for the past 2 weeks with no good results (I kept track using the training loss). It seems everytime I reload the model and train it, it behaves arbitrarily (I know this because if I train it continuously it shows very different training losses).
I have tried using the pytorch.save() fn, saving the state_dict and loading it, native pickle mechanism as well as joblib for the same but all of them have the same issue. I even saved the optimizer states and reloaded it without much luck.
Could it somehow be related to the hidden and cell states of the LSTM layers? Should I save and reload them as well everytime I want to train? Or could it be something else entirely?
I have searched extensively for this issue but to no avail. Any help would be much appreciated.
Some more info: I'm trying to detect anomalies in data using an autoencoder and so each time I reload the model, it is trained only on a batch of data and then saved again to be reused for next batch.