Handling Ctrl+C exception with GPU

Question

I am working with some GPU programs (using CUDA 4.1 and C), and sometimes (rarely) I have to kill the program midway using Ctrl+C to handle some exception. Earlier I tried using CudaDeviceReset() function, but this reply by talonmies displaced my trust in CudaDeviceReset() and hence I started handling such exceptions the Old-Fashioned way, that is 'computer restart'. As the project size grows, this method is becoming a headache. I would appreciate if anyone has come up with a better solution.

As I wrote I my replies to you earlier, `cudaDeviceReset()` is perfectly fine for destroying context and releasing resources which a process has allocated itself. You should call it in your code on exit. But it can't fix problems caused by other processes. If your host code or device code cannot be run and terminated without leaving the host driver or device in such a parlous state that it requires a reboot, you have a more serious design or code problem to fix first. The CUDA linux driver has had Ctrl+C problems in the past, but those were fixed several years ago AFAIK..... — talonmies, Apr 24 '12 at 09:18

score 3 · Accepted Answer · edited May 23 '17 at 11:55

3

I think this question is more fundamental -- it is really an app design issue and not a CUDA issue. If you design your app correctly to check for interrupts regularly, and exit your main loop and clean up resources upon interupt, then you shouldn't have this problem (and you can even call cudaDeviceReset() properly on exit.

The answers to this question may be helpful. And this one. And this one.

edited May 23 '17 at 11:55

Community

1
1

answered Apr 24 '12 at 09:11

harrism

26,505
2
57
88

But what about during the design phase. During design phase, I may have to stop the run midway using Ctrl+C. In such a case, the GPU memory used will stay that way despite proper failsafe techniques being in place. In such a scenario, how can I be sure that those GPU memory locations are reset to NULL before making a program re-run. – Abhinav Apr 24 '12 at 09:25
1

@Abhinav: I just don't believe that happens in practice. Can you edit you question to include evidence that abnormal termination of your code results in unreleased GPU resources/device "memory leaks"? – talonmies Apr 24 '12 at 09:31
Agreed. I use CTRL+C and in simple applications I usually don't take my own advice from this answer and properly handle interrupts, and I rarely have to reboot my computer (unless I write really bad CUDA code that hangs the machine). – harrism Apr 24 '12 at 10:43
@talonmies Ok, I can give it a try, I'll have to fill up the entire 4gb GPU global , and then interrupt the program .. Let me try to do that and form a more concrete question .. – Abhinav Apr 24 '12 at 11:15
I feel none of the answers really answer @Abhianav's question. Ctrl-C is not the issue here, as you can include `cudaDeviceReset()` in SIGINT handler. The problem (of Abhinav, and mine) is: for some reason GPU after program exit is in a state, which prevents it from running the code correctly again [correctly=as if after 'computer restart']. Therefore questions are: (1) how comes GPU can get into such state even after surely executing `cudaDeviceReset()` at exit, and (2) is there a software-(perhaps OS-) way to get it to a "fresh" state. – P Marecki Jul 23 '12 at 20:30
As we explained, we need a repro to understand what is happening. Leaving memory allocated on exit should not cause the GPU to "get in a bad state" requiring a computer reboot. If it does, that may be a bug, and we'd like a repro so we can fix it. We left this question with Abhinav in April and he said he would come back with a more concrete question. Still waiting. – harrism Jul 23 '12 at 22:55

Handling Ctrl+C exception with GPU

1 Answers1

Linked