How to avoid memory leak on segfault with Cuda

Question

I work with a cuda device on an integration branch and I still have some segfault while I am finishing my work.

After each crash, my memory consumption is increased by 500 Mo (seen using free -m, htop and one other but I don't remember the name). This memory is never released so on this computer with 4Go of RAM so I have to reboot after some crashes otherwise memory swap and it is really really slow (as usual when memory swap happen).

I know the good answer is : "Fix your segfault!!" but I would like to understand why this behavior happen and how can I prevent it.

I read that CUDA memory should be released by OS on segfault and it looks like it doesn't.

While I tried to debug my program, I notice that if I fix the segfault, memory is free'd correctly but if I also comment the cuda release line : cudaFreeHost(buf) (with segfault fixed), I still have the memory leak.

My memory is allocated as pinned pages : cudaHostAlloc(&ret, n*sizeof(my_struct), cudaHostAllocPortable).

I wanted to make sure the "free" code is called using unique_ptr but it will not solve the problem with segfault.

I looked at persistent mode for CUDA : http://docs.nvidia.com/deploy/driver-persistence/index.html but it is disable on my computer (I checked it with nvidia-smi).

I tried to reset the cuda device : nvidia-smi -r but it say it is not supported on my computer.

Questions are:

How can we ask program (or OS) to release these resources at the end of the program?
If we can't, Does a command exists to recover these resources after the crash?

Version:

CUDA 6.0.1
gcc 4.9.2
Driver Version: 340.65
Card : GeForce 610M

Update:

Here is a sample code to reproduce the problem. With commented line, I leak 10 Mo per run.

#include <cuda.h>
#include <cuda_runtime.h>

int main() {

    int *ret;
    cudaHostAlloc(&ret, 10000000 * sizeof(*ret), cudaHostAllocPortable);
    //cudaFreeHost(ret);
    return 0;
}

Update 2 :

             total       used       free     shared    buffers     cached
Mem:       3830056    1487156    2342900      66336     142840     527088
-/+ buffers/cache:     817228    3012828
Swap:      7811068          0    7811068
1Erreur de segmentation
2Erreur de segmentation
3Erreur de segmentation
4Erreur de segmentation
5Erreur de segmentation
6Erreur de segmentation
7Erreur de segmentation
8Erreur de segmentation
9Erreur de segmentation
10Erreur de segmentation
11Erreur de segmentation
12Erreur de segmentation
13Erreur de segmentation
14Erreur de segmentation
15Erreur de segmentation
16Erreur de segmentation
17Erreur de segmentation
18Erreur de segmentation
19Erreur de segmentation
20Erreur de segmentation
             total       used       free     shared    buffers     cached
Mem:       3830056    1766580    2063476      64152     142860     531032
-/+ buffers/cache:    1092688    2737368
Swap:      7811068          0    7811068

Could you post a short repro case which illustrates the apparent memory leak? I am not sure your diagnosis of what is happening here is correct, and it might be that you are misinterpreting what is occuring. Also, you are running a two year old CUDA toolkit and a year old driver. Have you considered upgrading to see whether the behaviour is the same? — talonmies, Dec 30 '15 at 09:11
I added a sample code to repro case. I agree, the CUDA version should be up to date but I didn't had time to do this migration. I may/will try to do it to see if It happen again. — P. Brunet, Dec 30 '15 at 10:17
I have run your code 1000 times in a shell loop on a 64bit system with 16Gb of ram with the 352.39 driver and CUDA 6 runtime and observed no memory leakage at all. I don't know what is going on with your code or your system, but CUDA doesn't leak host memory on exit in the way you are suggesting. — talonmies, Dec 30 '15 at 12:29
Definitely try latest CUDA and driver. Also, if for some reason the host process associated with the program that segfaulted does not actually terminate, then you may not see the memory returned to the system pool, since the OS thinks the process is still running (it may be zombie or in some other wierd state). Try the linux command `ps -ef |grep uname` (where `uname` is replaced by your actual linux username) to see all processes associated with your username. If you find any that look like they were associated with the segfaulted process, try killing them manually. — Robert Crovella, Dec 30 '15 at 16:06
Thanks you robert for your suggestion. I already checked zombies process and there are none. Just in case, I run your suggested command line and nothing interesting appeared. I really have to check with newer version of cuda but clients have a wheezy distribution and cuda 7 is not packaged on debian. Migration would be tedious. — P. Brunet, Dec 30 '15 at 18:11

talonmies · Answer 1 · 2016-01-01T10:17:13.270

I have built a slightly modified version of your repro case:

#include <cuda.h>
#include <cuda_runtime.h>
#include <signal.h>

int main() {

    int *ret;
    const size_t sz = 1 << 30;
    cudaHostAlloc(&ret, sz * sizeof(*ret), cudaHostAllocPortable);
    raise(SIGSEGV);
    return 0;
}

which on my system should allocate 8Gb of pinned portable memory and raise a segfault, which produces abnormal exit and a core dump. I ran this on a 16Gb machine with the 352.39 driver and CUDA 6 runtime in a shell loop, which according to your analysis should cause leaking and cache thrashing within two or three runs:

$ free; for i in {1..20}; do echo -n $i; ./a.out; done; free
             total       used       free     shared    buffers     cached
Mem:      16308996    3509924   12799072          0     303588    2313332
-/+ buffers/cache:     893004   15415992
Swap:      8257532          0    8257532
1Segmentation fault (core dumped)
2Segmentation fault (core dumped)
3Segmentation fault (core dumped)
4Segmentation fault (core dumped)
5Segmentation fault (core dumped)
6Segmentation fault (core dumped)
7Segmentation fault (core dumped)
8Segmentation fault (core dumped)
9Segmentation fault (core dumped)
10Segmentation fault (core dumped)
11Segmentation fault (core dumped)
12Segmentation fault (core dumped)
13Segmentation fault (core dumped)
14Segmentation fault (core dumped)
15Segmentation fault (core dumped)
16Segmentation fault (core dumped)
17Segmentation fault (core dumped)
18Segmentation fault (core dumped)
19Segmentation fault (core dumped)
20Segmentation fault (core dumped)
             total       used       free     shared    buffers     cached
Mem:      16308996    3510740   12798256          0     303588    2313272
-/+ buffers/cache:     893880   15415116
Swap:      8257532          0    8257532

However, you can see that only yields a 0.006% decrease in free memory after allocating 160Gb of pinned memory and never calling a memory release API or allowing the code to follow the normal code path to exit. No memory leak or net change in free resources occurred.

The CUDA driver and runtime will release both host and GPU resources at exit, be it normal or abnormal, with or without explicit calls to memory free APIs. I can't tell you what the problem is with your code or system, but a lack of host resource release on application exit by the CUDA runtime or driver is most likely not the root cause.

I would encourage you to modify my code to fit the size of the physical memory on your machine (use half the physical memory) and run it as I have done in a loop with memory reporting directly before or after. I very much doubt you will see anything different from what I have posted in this answer. If you do, I would strongly recommend a driver update to the most recent release driver.

I copy/paste your code to run exactly the same test. My results are : First free : used = 1205504 Second free : used = 1519068 I use same values than you. I agree it do not totally leak but it is 8 % which can't be ignored. I know the cuda runtime or driver or whatever should release resources but it is obviously not totally done in my case. — P. Brunet, Dec 30 '15 at 18:00
@P.Brunet: Can you post the complete output of free before and after somewhere? (perhaps in a edit in your question). This sounds like you are just misinterpreting what the output of free/top actually mean. An 8% change in total free memory when you are allocating half of your physical memory in a single contiguous buffer doesn't not imply you have a memory leak. The linux kernel dynamically reserves and frees file system cache and buffers that could easily explain a discrepancy of that size. — talonmies, Dec 30 '15 at 19:39
I understand you don't want to believe me so I put the full result but I think I can read it. If it was really the caching or buffering mechanism, it would not have swap later as the OS remove it when it is required. In this case (I rerun it right now), it is also 300 Mo of memory leak. — P. Brunet, Dec 31 '15 at 07:44
@P.Brunet: It isn't that I don't want to believe you. It is that 99% of the time, questions like this are really misdiagnosis rather than a real problem - like [this recent one](http://stackoverflow.com/q/34371828/681865) for example. What I am trying to do is get you to refine the problem to a root cause. I (or anyone else) can't help you solve a problem which we can't reproduce without enough details to eliminate all the obvious stuff. You might well have uncovered a driver or toolkit bug, but you are using an old toolkit and driver and NVIDIA will just tell you to upgrade if you report it. — talonmies, Dec 31 '15 at 11:48
I totally understand this. I didn't believe it was the real cause at the beginning and it is normal to start for scratch with this kind of investigation. Thanks for your support. I will look for an upgrade ASAP. — P. Brunet, Dec 31 '15 at 13:51

How to avoid memory leak on segfault with Cuda

1 Answers1