Yes, there is a limit, and by default it is around 5s. It is a driver watchdog limit, making the driver of the primary GPU (unresponsive because of the kernel calculation) to terminate the program and sometimes even hung the driver and the entire Windows.
More about this e.g. here: How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?
Happened to me as well in the past when I was experimenting with CUDA, my solution at the time was to split the calculation into multiple kernel calls.
Alternatively you might try to increase the timeout in the Windows registry: Modifying registry to increase GPU timeout, windows 7 (I don't have experience with that).
The other (but not so useful) alternative also mentioned in the first link is to use an additional GPU card, which will not serve the primary display and will just be used for the calculations (then the watchdog timer should not apply for it).
In Linux, there seems to be a limit as well, see e.g. here: https://askubuntu.com/questions/348756/disable-cuda-run-time-limit-on-kernels
And here: How to disable or change the timeout limit for the GPU under linux?
EDIT
Seems that according to this forum thread: https://devtalk.nvidia.com/default/topic/414479/the-cuda-5-second-execution-time-limit-finding-a-the-way-to-work-around-the-gdi-timeout/
it can happen that even separate kernel calls could get accumulated somehow (keep the GPU driver busy) and trigger the watchdog.
What they recommend there is to put cudaThreadSynchronize()
between each kernel call (note it is different from cudaDeviceSynchronize()
you have there - actually they should work the same but I've found reports of code working with cudaThreadSynchronize and not working with cudaDeviceSynchronize).
The watchdog should also not apply if the graphic X Windows is not running - to see if that is the case, you can try reboot to textmode (sudo init 3
) and run the program to see if it will work then.