Catching an exception thrown from a callback in cudaLaunchHostFunc

Question

I want to check for an error flag living in managed memory that might have been written by a kernel running on a certain stream. Depending on the error flag I need to throw an exception. I would simply sync this stream and check the flag from the host, but I need to do so from inside a CUDA graph. AFAIK I need to somehow encode this host-side error checking inside a cudaLaunchHostFunc callback.

I am trying to understand how the cudaLaunchHostFunc function deals with exceptions. The documentation does not mention anything about it. Is there any way to catch of an exception thrown from inside the function provided to cudaLaunchHostFunc?

Consider the following MWE:

#include<iostream>
#include <stdexcept>

__global__ void kern(){
  int id = blockIdx.x*blockDim.x + threadIdx.x;
  printf("Kernel\n");
  return;
}

void foo(void* data){
  std::cerr<<"Callback"<<std::endl;
  throw std::runtime_error("Error in callback");
}

void launch(){
  cudaStream_t st = 0;
  kern<<<1,1,0,st>>>();
  cudaHostFn_t fn = foo;
  cudaLaunchHostFunc(st, fn, nullptr);
  cudaDeviceSynchronize();
}

int main(){
  try{
    launch();
  }
  catch(...){
    std::cerr<<"Catched exception"<<std::endl;
  }
  return 0;
}

The output of this code is:

Kernel
Callback
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error in callback
Aborted (core dumped)

The exception is thrown but it appears that it is not propagated to the launch function. I would have expected the above launch() function to be equivalent (exception-wise) to the following:

void launch(){
  cudaStream_t st = 0;
  kern<<<1,1,0,st>>>();
  cudaStreamSynchronize(st);
  foo(nullptr);
  // cudaHostFn_t fn = foo;
  // cudaLaunchHostFunc(st, fn, nullptr);
  cudaDeviceSynchronize();
}

which does outputs the expected:

Kernel
Callback
Catched exception

Additionally, in the first case, all cuda calls return cudaSuccess.

How about storing the exception in `data` and checking it afterwards in `launch`? — Abator Abetor, Jan 17 '23 at 14:20
I suggest to implement checks inside the `launch` routine following this: https://stackoverflow.com/questions/6419700/way-to-verify-kernel-was-executed-in-cuda , then throw the exception explicitly! In addition, you could turn the routine `launch` from `void` to `bool` and if an error occur, return an analogous message to the main. In this way you may avoid the `try-catch` statement as well! — Andreas Hadjigeorgiou, Jan 17 '23 at 14:51
@AbatorAbetor , Comunicating via data is a good idea. But AFAIK foo is launched asynchronously, so that approach would require to synchronize. Besides, I wanted to do the error checking inside foo because otherwise the host logic cannot be captured into a graph. Maybe I am trying to have the cake and eat it -.- — Raul, Jan 18 '23 at 12:11
@AndreasHadjigeorgiou I want to make the logic launch part of a larger CUDA graph, which prevents me from doing the check in the launch function. I also wanted to avoid an explicit synchronization. — Raul, Jan 18 '23 at 12:23
I was kind of hoping that something like cudaLaunchHostFunc waiting to launch the exception until explicit sync happend, cudaDeviceSynchronize. The docs do not mention this explicitly, but I believe cudaLaunchHostFunc launches the function in a new thread. Since one does not have access to this thread control of the exception is lost. I guess the answer to my question is simply: No :( — Raul, Jan 18 '23 at 12:26
Yes, the host function that is run as a result of `cudaLaunchHostFunc()` is run in a CPU thread that is launched/maintained by the CUDA runtime. This should be fairly evident since it runs asynchronous to the user's thread that executed the `cudaLaunchHostFunc()` — Robert Crovella, Jan 18 '23 at 15:54

score 1 · Accepted Answer · answered Jan 19 '23 at 09:29

Thanks to the comments I understand now that my question is essentially the same as, for instance, this one: How can I propagate exceptions between threads?

The techniques used to take exceptions from a worker thread to the main thread also apply here.

For completion, the foo and launch functions in my dummy example could be rewritten as follows

void foo(void* data){
  auto e = static_cast<std::exception_ptr*>(data);
  std::cerr<<"Callback"<<std::endl;
  try{
    throw std::runtime_error("Error in callback");
  }
  catch(...){
    *e = std::current_exception();
  }
}

void launch(){
  cudaStream_t st = 0;
  dataD = 0;
  kern<<<1,1,0,st>>>();
  cudaStreamSynchronize(st);
  cudaHostFn_t fn = foo;
  std::exception_ptr e;
  cudaLaunchHostFunc(st, fn, (void*)&e);
  cudaDeviceSynchronize();
  if(e) std::rethrow_exception(e);
}

Which prints the expected:

Kernel
Callback
Catched exception

Catching an exception thrown from a callback in cudaLaunchHostFunc

1 Answers1