Longish CUDA calls kill the driver

Question

My understanding that CUDA should be used only on computation extensive code as the call setup has significant overhead. However in my case whenever the kernel call exceeds about 2 seconds or so I get a message from the Windows taskbar that the driver crashed and was recovered. I found two ways of defeating this. 1 Disabling the watchdog timer somewhere in the registry, which I am not willing to do. 2.Splitting long calls into shorter one, which brings a/m overhead and my CPU code actually runs faster.

The code itself is very simple so I do not think that the crash happens in the code.

extern "C" __global__ void add( double *x, double *y, double *z,  double *d, double * n ) {
 size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
 if (idx < n[0])
 {
   double thisX = x[idx];
   double thisY = y[idx];
   double thisZ = z[idx];

   //int i = tid;
   for(int i = 0; i < n[0]; i++)
   {
       double distance = sqrt((thisX-x[i])*(thisX-x[i]) + (thisY-y[i])*(thisY-y[i]) + (thisZ-z[i])*(thisZ-z[i]));
       d[idx] = distance;
   }
 }
}

I assume I am doing something very stupid as it is very basic setup and should work with no issues.

What's the question specifically, apart form what you answered yourself? — void_ptr, Dec 09 '15 at 19:47
^this, but may also be a duplicate: http://stackoverflow.com/questions/17186638/modifying-registry-to-increase-gpu-timeout-windows-7 — Marco13, Dec 10 '15 at 22:40
How is it a duplicate? I do not want to change the timeout or disable it. I am trying to figure out a way to distribute a commercial application. So far from what I see this limitation is a very serious showstopper to the point we stopped development in this direction. — user2555515, Dec 11 '15 at 23:09

score 1 · Answer 1 · answered Dec 10 '15 at 09:10

If your cuda device is your main display device the cuda runtime is bound by this windows specific driver rule. If the display driver is not responding for 2 seconds it gets restarted and all your running kernels get killed. The only way to solve your problem is to split your calls or to use a second display card for visualisation so the first cuda device would be unrestricted.

Of course the splitting would imply some overhead in kernel calls and synchronization.

score 1 · Answer 2 · answered Dec 10 '15 at 22:12

The timeout value can be modified by adding a registry key.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers

Name : TdrDelay
Type : REG_DWORD
Value : desired timeout in second

Keep in mind that if your kernel enters an infinite loop somehow, your screen will lock until the timeout is reached and then the driver will reset.

Longish CUDA calls kill the driver

2 Answers2