Keras stopped training despite using GPU memory

Question

similarly to the topic below, keras stopped working. tf.keras - Training on first epoch not progressing despite using GPU memory

I've a python 3.7 anaconda installation on windows cuda 10.2 and cudnn installed 3080 GPU keras 2.3.1 TF 1.4

A few days ago everything was running perfectly. Then after installing pytorch keras stopped working. The same script I was training before now get stuck on the first epoch. No errors are displayed when running model.fit (verbose 2). Simply the whole memory is full (even using a very small dataset) and the training is not advancing. As additional information pytorch displayed an error about the impossibility to use cuda.

I've tried to format the whole PC (factory reset) and the issue is still happening. I'm out of ideas. Any suggestion would be more then welcome.

Thanks!

score 0 · Answer 1 · answered Apr 27 '21 at 13:55

0

I really think that factory reset of the whole PC was really not necessary. I would suggest creating two conda virtual environments, one with Tensorflow and the other with PyTorch. Conda virtual environments are a really useful, they keep things separated and this might be really useful for your application. Here there is the Anaconda official reference explaining how to manage the environments.

answered Apr 27 '21 at 13:55

Francesco Alongi

499
3
13

Thanks Francesco. I was using an environment. However, the point is that, even after a factory reset, basically deleting all the configurations, the issue still exists. As additional information, I would like to add that Matlab is installed in the same machine and DL is working without any issue, while anaconda TF/Keras stopped working. – user15774576 Apr 28 '21 at 14:51
Can you show us the error displayed by PyTorch? – Francesco Alongi Apr 28 '21 at 16:02
Thanks Francesco for your help. After flashing the video Bios with a newest release, I solved my problem. – user15774576 Apr 30 '21 at 16:38

Keras stopped training despite using GPU memory

1 Answers1