If I register a callback via cudaStreamAddCallback()
, what thread is going to run it ?
The CUDA documentation says that cudaStreamAddCallback
adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each
cudaStreamAddCallback
call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.
but says nothing about how the callback itself is called.