3

I am having an issue with my Graphics card retaining memory after the execution of a CUDA script (even with the use of cudaFree()).

On boot the Total Used memory is about 128MB but after the script runs it runs out of memory mid execution.

nvidia-sma:

  +------------------------------------------------------+                       
| NVIDIA-SMI 340.29     Driver Version: 340.29         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 660 Ti  Off  | 0000:01:00.0     N/A |                  N/A |
| 10%   43C    P0    N/A /  N/A |   2031MiB /  2047MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

Is there any way to free this memory back up without rebooting, perhaps a terminal command?

Also is this normal behaviour if I am not managing my memory correctly in a CUDA script, or should this memory be automatically freeing itself when the script stops / is quit?

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 2
    If your program actually exits, the CUDA context is destroyed and any resources it used are released. Are you certain you don't have a bunch of zombie or hung instances of you program still running in the background somewhere? – talonmies Apr 06 '15 at 13:34
  • That was the issue, I figured the processes would kill themselves on crash/completion. I checked the System Monitor and I found that I had a few processes of the out file I was running. After killing those the GPU memory freed itself. Is there a Command I can add to c/cuda to free all gpu memory on an unexpected stop ( such as ctrl+z quit, not just if cudaMalloc fails)? If you make that into an answer I will mark it correct also, thanks again – Jamie Stuart Robin Parsons Apr 06 '15 at 13:48
  • 1
    You do understand that ctrl-z doesn't send a SIGINT or SIGTERM signal to the foreground process, it sends SIGTSTP (unlike ctrl-c or kill). Unless you register a signal handler in your application to catch SIGSTP and cause the application to exit, it will never know that ctrl-z was ever pushed (which is by design). This sounds like a user behaviour problem, not a programming one. – talonmies Apr 06 '15 at 14:00

1 Answers1

6

The CUDA runtime API automatically registers a teardown function which will destroy the CUDA context and release any GPU resources which the application was using. As long as the application implicitly or explicitly calls exit(), then no further user action is required free resources like GPU memory.

If you do find that memory doesn't seem to be released when running a CUDA code, then the usual suspect is suspended or background instances of that or other code which has never called exit() and never destroyed their context. That was the cause in this case.

NVIDIA do provide an API function cudaDeviceReset, which will initiate context destruction at the time of the call. It shouldn't usually be necessary to use this function in well designed CUDA code, rather you should try and ensure that there is a clean exit() or return path from main() in your program. This will ensure that the context destruction handler which the runtime library is called and resources are freed.

talonmies
  • 70,661
  • 34
  • 192
  • 269