0

When we launch a CUDA kernel on a stream, an error may occur as we submit it for launching (e.g. cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice); and an error may occur as the kernel executes (e.g. illegal memory access).

If we launch a kernel on the default stream of a device, or more generally on a synchronous stream - is the return value only guaranteed to "catch" only the launch errors proper, like with asynchronous launches? Or - is it guaranteed to also catch any error during the kernel's run?

einpoklum
  • 118,144
  • 57
  • 340
  • 684

1 Answers1

1

No, and it's even worse that that.

Empirically, launching on the default stream may succeed despite the kernel triggering an error. In fact, even something like:

myKernel<<<blocksPerGrid, threadsPerBlock>>>(whatever);
cudaError_t err = cudaGetLastError();

may fail to produce the error code corresponding to what the kernel has done.As @talonmies indicates here, certainty in encountering the error requires either a cudaDeviceSynchronize(), or a blocking API call (such as cudaMemcpy()).

einpoklum
  • 118,144
  • 57
  • 340
  • 684