Does launching a kernel on the default stream "catch" it launch-time errors?

Question

When we launch a CUDA kernel on a stream, an error may occur as we submit it for launching (e.g. cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice); and an error may occur as the kernel executes (e.g. illegal memory access).

If we launch a kernel on the default stream of a device, or more generally on a synchronous stream - is the return value only guaranteed to "catch" only the launch errors proper, like with asynchronous launches? Or - is it guaranteed to also catch any error during the kernel's run?

kernel launches are always asynchronous, whatever stream you use — talonmies, Feb 17 '21 at 20:49

einpoklum · Accepted Answer · 2021-02-18T09:10:32.217

No, and it's even worse that that.

Empirically, launching on the default stream may succeed despite the kernel triggering an error. In fact, even something like:

myKernel<<<blocksPerGrid, threadsPerBlock>>>(whatever);
cudaError_t err = cudaGetLastError();

may fail to produce the error code corresponding to what the kernel has done.As @talonmies indicates here, certainty in encountering the error requires either a cudaDeviceSynchronize(), or a blocking API call (such as cudaMemcpy()).

Does launching a kernel on the default stream "catch" it launch-time errors?

1 Answers1

No, and it's even worse that that.