7

If I register a callback via cudaStreamAddCallback(), what thread is going to run it ?

The CUDA documentation says that cudaStreamAddCallback

adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cudaStreamAddCallback call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.

but says nothing about how the callback itself is called.

fwyzard
  • 2,364
  • 1
  • 21
  • 19
  • 3
    it is a thread launched and managed by the CUDA driver. not something explicitly visible in the CUDA programming model. You don't have to do anything explicit to create or manage this thread. The callback is called when the stream it is launched into reaches the callback (i.e. all previous activity issued to that stream has completed) – Robert Crovella Nov 07 '17 at 19:33

1 Answers1

4

Just to flesh out comments so that this question has an answer and will fall off the unanswered queue:

The short answer is that this is an internal implementation detail of the CUDA runtime and you don't need to worry about it.

The longer answer is that if you look carefully at the operation of the CUDA runtime, you will notice that context establishment on a device (be it explicit via the driver API, or implicit via the runtime API) spawns a small thread pool. It is these threads which are used to implement features of the runtime like stream command queues and call back operations. Again, an internal implementation detail which the programmer doesn't need to know about.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Thanks for the answer. I do care, because I may need to figure out how that thread interacts with my application's thread pool. – fwyzard Mar 25 '18 at 21:53
  • @fwyzard: Again, you don't because the threads don"t interact with anything your code does in any way. – talonmies Mar 26 '18 at 09:27
  • 2
    I don't understand your comment. We do have per-thread resources, and their allocation could be static if the number of thread in the application's thread pool is predefined. However, if the callback function needs to access those per-thread resources, they will need to be provisioned dynamically, because we do not have control on the CUDA thread pool. – fwyzard Mar 26 '18 at 10:02