When we launch a CUDA kernel on a stream, an error may occur as we submit it for launching (e.g. cudaErrorInitializationError
, cudaErrorInsufficientDriver
or cudaErrorNoDevice
); and an error may occur as the kernel executes (e.g. illegal memory access).
If we launch a kernel on the default stream of a device, or more generally on a synchronous stream - is the return value only guaranteed to "catch" only the launch errors proper, like with asynchronous launches? Or - is it guaranteed to also catch any error during the kernel's run?