2

I use the checkCudaErrors helper function from the CUDA Toolkit Samples. See "helper_cuda.h". I am perplexed as to why the launch error from this example is not caught by checkCudaErrors. The error is too many threads are launched (2048).

From Debug (linux gdb), the console prints (stderr in red) "warning: Cuda API error detected: cudaLaunch returned (0x9)".

Whereas when I execute either the Release or Debug builds from a Bash shell, no error is printed by checkCudaErrors.

Why is this?

My expectation is the error would be caught and printed at the D2H memcpy call immediately proceeding the launch. Is this incorrect?

Minimal reproducible example:

#include <cuda.h>
#include "helper_cuda.h"

__global__ void BusyIncrementKernel( const size_t increments, float * result){
    float tmp = 0;
    for ( size_t i = 0; i < increments; ++i ){ tmp += 1; }
    const int j = threadIdx.x + blockIdx.x*blockDim.x;
    if ( j == 0 ){ *result = tmp; }
}

int main( int argc, char * argv[] ){
    unsigned int blockDim = 2048;
    dim3 block{ blockDim, 1, 1};
    dim3 grid{ 1, 1, 1};
    float * dResult;
    checkCudaErrors( cudaMalloc( &dResult, sizeof(float) ));
    BusyIncrementKernel<<< grid, block >>>( 10000000, dResult );
    float result;
    checkCudaErrors( cudaMemcpy( &result, dResult, sizeof(float), cudaMemcpyDeviceToHost ));
    checkCudaErrors( cudaFree( dResult ));
    checkCudaErrors( cudaDeviceSynchronize() );
    fprintf( stderr,"result: %f\n", result );
    return 0;
}
einpoklum
  • 118,144
  • 57
  • 340
  • 684
Tyson Hilmer
  • 741
  • 7
  • 25
  • Where exactly does it fail? I am almost sure you can't get an error with checkCudaError() for Kernel errors,but only client API calls. – Michael IV Mar 26 '18 at 10:05
  • @MichaelIV the program always returns after the fprintf( result: ....) in main(). I expect it to return from within checkCudaErrors( ... exit(EXIT_FAILURE) ), but it does not. I also expect Kernel errors are eventually returned by later CUDA API calls, e.g. the proceeding cudaMemcpy, cudaFree, or definitely the cudaDeviceSynchronize. – Tyson Hilmer Mar 26 '18 at 10:10
  • So you're saying checkCudaErrors doesn't return any errors at all? – Michael IV Mar 26 '18 at 10:12
  • @MichaelIV. Yes. Which makes no sense. It has worked well for me in the past. Clean rebuild and build call all look normal. – Tyson Hilmer Mar 26 '18 at 10:13
  • Maybe you're running in release build mode? Do regular asserts work? – Michael IV Mar 26 '18 at 10:14
  • Kernel launches require a particular sequence of API calls to catch launch errors. See https://stackoverflow.com/a/14038590/681865 – talonmies Mar 26 '18 at 10:24
  • @talonmies. I should be paying you for this advice. Per your post; checkCudaErrors( cudaPeekAtLastError() ) works, i.e. the error is caught and printed. This is disturbing and perplexing. I thought subsequent API calls were redundant with cudaPeekAtLastError, whereas this evidences they are not. A [closer read of the docs](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#error-checking) helped. – Tyson Hilmer Mar 26 '18 at 10:43
  • @MichaelIV. Apologies for the earlier incorrect comment, which I have deleted. Regular asserts do not work. (I had the logical inverted.) – Tyson Hilmer Mar 26 '18 at 12:00

2 Answers2

2

This answer by talonmies specifically states kernel launches require a slightly different pattern to handle. The CUDA API documentation 3.2.9. on Error Checking explains this.

This answer by Robert Crovella indicates there are two error types, that differ in how the API reports (returns) them*.

My results are; the only way to catch kernel launch errors is with cudaPeekAtLastError() or cudaGetLastError() after the launch call. These are the only API functions that returned the launch error code. Other subsequent API calls did not return the launch error code, nor did they clear it; it could be obtained later by cudaPeekAtLastError or cudaGetLastError.

Tyson Hilmer
  • 741
  • 7
  • 25
  • This particular case has nothing to do with asynchrony, or waiting long enough. The first paragraph of your answer is off-base. And this question is basically a duplicate of the one talonmies provided the link to. – Robert Crovella Mar 26 '18 at 13:58
  • @RobertCrovella I thought so, but don't understand why. Can you please elaborate? Or simply edit for correctness. – Tyson Hilmer Mar 26 '18 at 13:59
  • @RobertCrovella Talonmies post makes sense. But I cannot agree with the doc stating "...; if an asynchronous error occurs, it will be reported by some subsequent unrelated runtime function call." That implies ***any*** API call, whereas that is not what I am getting. – Tyson Hilmer Mar 26 '18 at 14:04
  • The error you have here is not an asynchronous error. An asynchronous error occurs during execution of kernel code, such as an out-of-bounds access, etc. The error you have here is an invalid launch configuration. This is caught at launch time, and does not occur asynchronously some time later as the kernel is actually executing device code. There are basically [*two* types of errors](https://stackoverflow.com/questions/31642520/states-of-memory-data-after-cuda-exceptions/31642573#31642573). – Robert Crovella Mar 26 '18 at 14:09
  • @RobertCrovella. Thank you kindly :) The emphasis that there are two types of errors is exactly what I was having trouble understanding. Hopefully I have edited the answer to be correct. – Tyson Hilmer Mar 26 '18 at 14:19
0

CUDA kernel launches do not return an error code for the launch. To catch the error, you need to perform some explicit error checking after the launch, and before any additional API calls:

checkCudaErrors( cudaPeekAtLastError() );
checkCudaErrors( cudaDeviceSynchronize() );

The first call should catch at least any launch errors, and by the second call errors during kernel execution will have been caught as well (see also this answer). Since you haven't done this, you don't see the error until the next API call at the earliest.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • 1
    I tried the cudaPeekAtLastError() call immediately after the kernel launch, and after the three subsequent API calls (memcpy,free,device sync). It returned an error code in both cases, whereas the other API calls did not. Which implies cudaPeekAtLastError() does not need to be called immediately, i.e. subsequent API calls do not overwrite the last error. I would sure appreciate confirmation or rejection of this. – Tyson Hilmer Mar 26 '18 at 14:36
  • 1
    actually, this type of error manifests **immediately**. Unlike many type of kernel errors, this one is synchronous. It is available to be inspected as soon as the underlying API call (`cudaLaunch`) is complete. And no, this error is not "lost". There is no "overwriting". – Robert Crovella Mar 26 '18 at 14:37
  • @RobertCrovella: 1. About "immediately" - I meant to say in the instruction used for the kernel launch; edited. 2. I thought OP said his program exits after the last `fprintf()` - which means that no API error was caught until program exit. – einpoklum Mar 26 '18 at 14:56
  • @TysonHilmer: Didn't you write in a comment on your question that your program executes until after the last `fprintf()`? If that's the case, doesn't that mean that no error code is issued? – einpoklum Mar 26 '18 at 14:57
  • @einpoklum Yes. $ Release/TestLaunchError; echo $? result: 0.000000 0 It returns 0 and no error code is issued. That is for the source as currently given in the question, i.e. lacking cudaPeekAtLastError. – Tyson Hilmer Mar 26 '18 at 15:04
  • In fact, kernel launches in the runtime API *do* return an error code, but the boilerplate code which the `<<< >>>` syntax is translated to drops the return code. – talonmies Mar 27 '18 at 00:21