0

I'm facing this issue which I can't solve so far. I'm training an LSTM for image captioning with a data generator and my model training seems to always crash at the same epoch and step despite working fine for the first epoch. I guess it has to do with some memory issue but don't know what to do... enter image description here Any help/advice is more than welcomed. Thanks a lot !

elka
  • 35
  • 6
  • If you are on a linux system, did you run htop beside? And if you use GPU ```nvidia-smi -l 1``` ? If Windows system just use the Taskmanager for Systemusage. – MaKaNu Nov 20 '20 at 10:19
  • thanks for your answer. i just checked so indeed had the swap already saturated and the memory too, so restarting the computer enables me to go further in training. I could reach and complete 3rd epoch. however this time it crashes within the 4th since memory use is still increasing with the number of epochs until it gets saturated then uses the swap until saturation too and crash...any idea on how to deal with this ? except by training for 3 epochs, saving the model, restarting the device, reloading the model and training for another 3 epochs ? – elka Nov 20 '20 at 13:33
  • Is in epoch 1 the modell already using swap? In generel your complete modell should fit on RAM (CPU) or VRAM (GPU). – MaKaNu Nov 20 '20 at 15:03
  • If your swap is not filling in the first epoch, it might be a problem with k.session_clear(). https://stackoverflow.com/questions/50895110/what-do-i-need-k-clear-session-and-del-model-for-keras-with-tensorflow-gpu – MaKaNu Nov 20 '20 at 15:17

0 Answers0