15

I am frequently rerunning the same mxnet script while I try to iron out some bugs in a new script (and I am new to mxnet). Pretty often when I try to run my script I get an error that the GPU is out of memory, and when I use nvidia-smi to check, this is what I see:

Wed Dec  5 15:41:29 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02              Driver Version: 396.24.02                 |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:65:00.0  On |                  N/A |
|  0%   54C    P2    68W / 300W |  10891MiB / 11144MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1446      G   /usr/lib/xorg/Xorg                            40MiB |
|    0      1481      G   /usr/bin/gnome-shell                         114MiB |
|    0     10216      G   ...-token=8422C9FC67F51AEC1893FEEBE9DB68C6    31MiB |
|    0     18221      G   /usr/lib/xorg/Xorg                           458MiB |
|    0     18347      G   /usr/bin/gnome-shell                         282MiB |
+-----------------------------------------------------------------------------+

So it seems like most of the memory is in use (10891/11144) BUT I don't see any process in the list taking up a large portion of the GPU, so there doesn't seem to be anything to call. And my mxnet script has been exited out, so I assume it shouldn't be that. I would understand if there were some seconds or even tens of seconds lagging if the GPU does not know right away that the script no longer needs memory, but I am going on many minutes and still see the same display. What gives, and is there some memory cleanup I should do? If so, how? Thank you for any tips to a newbie.

TFdoe
  • 571
  • 5
  • 16

1 Answers1

6

The GPU memory usage is completely bound to the lifetime of the process. If you see GPU memory used, there must be a process that's still alive and holding on to memory. If you run ps -a |grep python you should see all python processes and that will tell you which process is still alive.

Sina Afrooze
  • 960
  • 6
  • 11
  • How could I use this command in Windows ? – ProgrX May 11 '22 at 12:25
  • 1
    Using `ps -a | grep python` I did find & kill some processes which were consuming GPU memory. However, after killing all the processes returned by `ps -a | grep python`, there is still some GPU memory being used according to `nvidia-smi`. I'm using `detectron2` and I'm wondering if this has to do with `multiprocessing`. – user3731622 Sep 20 '22 at 16:22