0

I'm doing some test with the CUDA sample that we can find in this article https://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/

If I execute the code as it is on the article it works fine. But if I change the number of iterations in the main function it crash. The change is the following:

int main(int argc, char ** argv)
{
    ...
    for(int i=0; i<500000; i++) // I have change the number iterations from 100 to 500000
    {
            float dt = (float)rand()/(float) RAND_MAX; // Random distance each step
            advanceParticles<<< 1 +  n/256, 256>>>(dt, devPArray, n);
            cudaDeviceSynchronize();
    }
    ...
}

The only change that I have done is the number of iterations from 100 to 500000. The impact of this change is that the device crash and I need to reset the workstation.

Then I have a question: - Is there a kernel launch limit?

If there are not a limit, why the program crash?

Thank you.

  • 2
    Since you use cudaDeviceSynchronize(), you are always waiting for the kernel to finish before treating next iteration. This way you never launch more than one unique kernel at once. So it is not a kernel launch limitation. – Taro Dec 16 '16 at 14:37

1 Answers1

2

Yes, there is a limit, and by default it is around 5s. It is a driver watchdog limit, making the driver of the primary GPU (unresponsive because of the kernel calculation) to terminate the program and sometimes even hung the driver and the entire Windows.

More about this e.g. here: How can I override the CUDA kernel execution time limit on Windows with a secondary GPUs?

Happened to me as well in the past when I was experimenting with CUDA, my solution at the time was to split the calculation into multiple kernel calls.

Alternatively you might try to increase the timeout in the Windows registry: Modifying registry to increase GPU timeout, windows 7 (I don't have experience with that).

The other (but not so useful) alternative also mentioned in the first link is to use an additional GPU card, which will not serve the primary display and will just be used for the calculations (then the watchdog timer should not apply for it).


In Linux, there seems to be a limit as well, see e.g. here: https://askubuntu.com/questions/348756/disable-cuda-run-time-limit-on-kernels

And here: How to disable or change the timeout limit for the GPU under linux?


EDIT

Seems that according to this forum thread: https://devtalk.nvidia.com/default/topic/414479/the-cuda-5-second-execution-time-limit-finding-a-the-way-to-work-around-the-gdi-timeout/

it can happen that even separate kernel calls could get accumulated somehow (keep the GPU driver busy) and trigger the watchdog.

What they recommend there is to put cudaThreadSynchronize() between each kernel call (note it is different from cudaDeviceSynchronize() you have there - actually they should work the same but I've found reports of code working with cudaThreadSynchronize and not working with cudaDeviceSynchronize).

The watchdog should also not apply if the graphic X Windows is not running - to see if that is the case, you can try reboot to textmode (sudo init 3) and run the program to see if it will work then.

Community
  • 1
  • 1
EmDroid
  • 5,918
  • 18
  • 18
  • Yes, I read about the watchdog limit under Windows. In my case I'm working under Linux. Is there a watchdog limit in Linux? – Albert Herrero Dec 16 '16 at 14:44
  • Another information is that I'm working with Tesla k20m card and the property 'kernelExecTimeoutEnabled' that we can recover with 'cudaDeviceProp' is 0 which indicates that there is no execution time limit in the kernels. – Albert Herrero Dec 16 '16 at 14:49
  • 1
    The limit you are talking about here is a maximum computation time for the kernels. If this is the problem, why would the sample crash with 500'000 iterations and not 100 ? If the kernel lasts for more than 5 seconds, it should cause a crash on the very first iteration whatever the amount of iterations the loop has. – Taro Dec 16 '16 at 14:49
  • @Taro: True, I missed that in the code. But it might be, that the size of the kernel is somehow dependent on `i` because of the code we don't see (`i` might determine `devPArray` or `n`). – EmDroid Dec 16 '16 at 14:53
  • Hum, it doesn't seem so (saw the original code on the devblogs.nvidia.com). – EmDroid Dec 16 '16 at 14:57
  • The complete code is in the link and 'i' only is used how a counter in the for statement. – Albert Herrero Dec 16 '16 at 14:58
  • Currently I can't find any relation between i, 100 or 500'000 with the kernel itself. The number of particles remains the same (1'000'000), the grid dimensions for the kernel launch are not modified (1 + n/256 blocks of 256 threads). The only computed parameter is dt but it is just a floating number which is not based on any other value that a random generated number. – Taro Dec 16 '16 at 14:58
  • See the edit, try to add `cudaThreadSynchronize()` and/or reboot to textmode (without running X-Windows). – EmDroid Dec 16 '16 at 15:16
  • Yes, I've seen that as well; but also found this: https://devtalk.nvidia.com/default/topic/497096/cudathreadsynchronize-vs-cudadevicesynchronize-what-is-the-difference-/ reporting that the code works with cudaThreadSynchronize and crashes with cudaDeviceSynchronize ... – EmDroid Dec 16 '16 at 15:35
  • Hi, the function is deprecated. If I'm not wrong cudaThreadSynchronize and cudaDeviceSynchronize are almost the same. – Albert Herrero Dec 16 '16 at 15:37
  • Can you also try running after rebooting to textmode? If it will also crash there it would appear that the crash is not GPU watchdog related. – EmDroid Dec 16 '16 at 15:53
  • Hi, I tried with texmode and the execution crash. I get the following message: [ 820.536748] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus. – Albert Herrero Dec 16 '16 at 16:29
  • Maybe it can be a system problem? I'm using Ubuntu 16.04 – Albert Herrero Dec 16 '16 at 16:33
  • Hum, this is frenquently happening when there is hardware failure, but have also found references that sometimes it can be caused by the NVidia drivers. Here also a different Linux distribution (Slackware) helped: http://askubuntu.com/questions/281765/cuda-nvidia-card-has-fallen-off-the-bus – EmDroid Dec 16 '16 at 17:00
  • seriously? I can't believe it... I hope that the problem isn't the SO. I will execute the problem with a different card and different SO and will put here the conclusions – Albert Herrero Dec 16 '16 at 18:20
  • Hi, I tried the program with another workstation and works fine. It seems that the problem is on my Tesla or SO. Thank you! – Albert Herrero Dec 19 '16 at 09:06