I have RTX3060 for deep learning.There is no problem when I run a model with 40 million parameters (batch size is 64) on the cpu. But when I run the same model on GPU, I get ResourceExhaustedError. And I bought this GPU just to be able to run deep learning algorithms faster, but it doesn't work, on the contrary, I can't train models at all.What can I do about this issue?
Asked
Active
Viewed 53 times
0
-
RTX 3060 is good for deep learning because of the 12GB VRAM (and cuda + tensor cores). [You most likely aren't dynamically allocating gpu memory.](https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory). I've used RTX 3060 for a bit. `batch_size` of 64 is probably too large. When you run a tf.function, tensorflow tries to allocate 100% available gpu memory, so if you then try to use more from another process, for example, you'll get that error. Or if you try to load too much at once too. Limited gpu mem must come before any tf calls. – Djinn Jul 21 '22 at 02:47
-
The link you shared really worked for me, there is no problem for now and I can run the models. I am really grateful to you. By the way, why should I lower the batch_size value in such cases? – Emir Kutsal Jul 21 '22 at 03:13
-
If you're getting OOM error during training, it means too much data was loaded at once. The main thing controlling the size of data being loaded (with respect to total number) is `batch_size`. – Djinn Jul 21 '22 at 03:28