0

my GPU is NVIDIA RTX 2080 TI

Keras 2.2.4

Tensorflow-gpu 1.12.0

CUDA 10.0

Once I load build a model ( before compilation ), I found that GPU memory is fully allocated

[0] GeForce RTX 2080 Ti | 50'C, 15 % | 10759 / 10989 MB | issd/8067(10749M)

What could be the reason, how can i debug it?

I don't have spare memory to load the data even if I load via generators

I have tried to monitor the GPUs memory usage found out it is full just after building the layers (before compiling model)

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 1
    This is normal Tensorflow behavior. By default it does greedy allocation. A bit of research will indicate this to you. For instance, read [this](https://medium.com/@auro_227/tensorflows-its-complicated-relationship-with-gpu-memory-5672745df84). – Robert Crovella Jul 16 '19 at 14:09
  • 1
    tensorflow eats your GPU! Its a design choice, likely to make sure no other process allocates any memory on the GPU if tensorflow needs it at some point halfway its runtime – Ander Biguri Jul 16 '19 at 14:45
  • Possible duplicate of [How to prevent tensorflow from allocating the totality of a GPU memory?](https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory) – Chan Kha Vu Jul 16 '19 at 15:56
  • @RobertCrovella but with that it is so hard to run convolutions since they will use GPU memory, i tried to monitor the behavior and all memory were filled. I was feeding my network with small batches to assure minimum usage. – Mohamad Alhaddad Jul 16 '19 at 18:28
  • 1
    Just do what you want in TF. Just because it allocated all the memory doesn't mean you are out of memory. TF is managing the memory for you. Unless you hit an out-of-memory error, don't worry about it. Anyway there are plenty of questions and articles discussing this. You need to read some of those and understand what is going on. I'm not going to be able to explain it any better than all the documentation and questions and articles that already exist. – Robert Crovella Jul 16 '19 at 18:40

1 Answers1

0

I meet a similar problem when I load pre-trained ResNet50. The GPU memory usage just surges to 11GB while ResNet50 usually only consumes less than 150MB.

The problem in my case is that I also import PyTorch without actually used it in my code. After commented it, everything works fine. But I have another PC with the same code that works just fine. So I uninstall and reinstall the Tensorflow and PyTorch with the correct version. Then everything works fine even if I import PyTorch.