My understanding that CUDA should be used only on computation extensive code as the call setup has significant overhead. However in my case whenever the kernel call exceeds about 2 seconds or so I get a message from the Windows taskbar that the driver crashed and was recovered. I found two ways of defeating this. 1 Disabling the watchdog timer somewhere in the registry, which I am not willing to do. 2.Splitting long calls into shorter one, which brings a/m overhead and my CPU code actually runs faster.
The code itself is very simple so I do not think that the crash happens in the code.
extern "C" __global__ void add( double *x, double *y, double *z, double *d, double * n ) {
size_t idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n[0])
{
double thisX = x[idx];
double thisY = y[idx];
double thisZ = z[idx];
//int i = tid;
for(int i = 0; i < n[0]; i++)
{
double distance = sqrt((thisX-x[i])*(thisX-x[i]) + (thisY-y[i])*(thisY-y[i]) + (thisZ-z[i])*(thisZ-z[i]));
d[idx] = distance;
}
}
}
I assume I am doing something very stupid as it is very basic setup and should work with no issues.