How to check which CUDA error arises in which asynchronous CUDA call?

Question

Suppose we have the following situation:

launch_kernel_a<<<n_blocks, n_threads>>>(...);
launch_kernel_b<<<n_blocks, n_threads>>>(...);
cudaDeviceSynchronize();
if(cudaGetLastError() != CudaSuccess)
{
    // Handle error
    ...
}

My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError(). In that case, how do I figure out which kernel caused the error to occur during runtime?

In the most general case, you cannot. If you're willing to insert synchronization, then localization becomes easy. If you want to use a debugger methodology, you can enable the debugger to be started at the point of the failure, which will then make the answer obvious as to which kernel caused the issue. You can also use the method [here](https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218). Even that method will not properly localize it if you are launching two instances of the same kernel. — Robert Crovella, Jun 11 '20 at 21:01
So there is no way to localize the error at runtime? If say I wanted to relaunch the failed kernel, possibly with different parameters, there is no way to identify which kernel failed, and I have to assume that all kernels launched between synchronizations failed? — Diggs, Jun 11 '20 at 21:21
I don't wish to repeat myself. I've told you what I know. Someone else may have better suggestions. Also, you may wish to take note that a kernel that fails at run time in an asynchronous way (i.e. after launch, during execution) will corrupt the CUDA context. Such a context is unusable at runtime for further "relaunch" efforts (or, indeed, any CUDA activity at all), unless you terminate and restart the application or [take multiprocess steps](https://stackoverflow.com/questions/56329377/reset-cuda-context-after-exception/56330491#56330491). — Robert Crovella, Jun 11 '20 at 21:28
Thank you, knowing that kernel execution errors are essentially non recoverable does clarify a lot of things for me. I should have checked that before posting here. Sorry for making you repeat yourself. — Diggs, Jun 11 '20 at 21:35

talonmies · Accepted Answer · 2020-06-12T12:50:28.167

My understanding is that in the above, execution errors occurring during the asynchronous execution of either kernel may be returned by cudaGetLastError().

That is correct. The runtime API will return the last error which was encountered. It isn't possible to know from which call in a sequence of asynchronous API calls an error was generated.

In that case, how do I figure out which kernel caused the error to occur during runtime?

You can't. You would require some kind of additional API call between the two kernel launches to determine the error. The crudest would be a cudaDeviceSynchronize() call, although that would serialize the operations if they actually did overlap (although I see no stream usage so that is probably not happening here).

As noted in comments -- most kernel runtime errors will result in context destruction, so if you got an error from the first kernel, the second kernel will abort or refuse to run anyway and that is probably fatal to your whole application.

How to check which CUDA error arises in which asynchronous CUDA call?

1 Answers1