0

I'm running a long-running kernel on a nVidia Quattro 6000 device. The kernel involves a loop with tens of thousands of iterations.

When I ran the kernel, after 2 seconds the screen went black, Windows restarted GPU drivers and clFinish returned an error. So I got myself a second GPU card just for displaying and now the 2 seconds timeout does not apply.

The kernel computed for 50 seconds and then there were these errors (lines prefixed by "GPU ERROR" are errors printed by clCreateContext error callback):

GPU ERROR:
CL_OUT_OF_RESOURCES error executing clFinish on Quadro 6000 (Device 0).

Computation finished, took 50 seconds (00:00:50)
GPU ERROR:
CL_OUT_OF_RESOURCES error waiting for idle on Quadro 6000 (Device 0).

clFinish() returned CL_OUT_OF_RESOURCES
GPU ERROR:
CL_OUT_OF_RESOURCES error waiting for idle on Quadro 6000 (Device 0).

What can I do about it?

For the sake of simplicity, this is a stripped down version of this kernel. In reality it performs integration over a curved surface, that's why I need a loop - but this simple version crashes too, for large enough n.

__kernel void integrate(
                 __global float *input,
                 __global float *output,
                 unsigned int n,
         float c)
{
  size_t kernel_idx = (get_global_id(1)*get_global_size(0) + get_global_id(0));
  if(kernel_idx < inputWidth*inputHeight*inputDepth)
  {
    int j;
    ...
    float sum = 0.0;

    for(j = 0; j < n; j++) // y
    {
      sum += input[j];
    }

    output[kernel_idx] = sum;
    }
}
GDR
  • 2,301
  • 1
  • 21
  • 26
  • Update: after updating drivers, the errors from callback weren't present and clFinish() returned CL_INVALID_COMMAND_QUEUE – GDR Aug 03 '12 at 13:30

1 Answers1

1

Check your TDR registry key and adjust accordingly

http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx

Tim Child
  • 2,994
  • 1
  • 26
  • 25
  • I tried that already, it didn't help - this is my regedit screenshot: http://i.imgur.com/uVthg.png – GDR Aug 06 '12 at 10:12
  • OK, disabling the check entirely worked, unlike tweaks to timeouts. Thanks. – GDR Aug 06 '12 at 10:49