I have some work I want to do on a CUDA stream, say a kernel K
, which depends on previous work that needs to be done on the CPU. The exact details of the CPU work is not something that's known to me when I'm scheduling K
; I just want K
not to start until it is given an indication that everything is ready.
Now, if I had known exactly what CPU work is to be done, e.g. that K
could start after some function foo()
concludates, I could do the following:
- Enqueue a call to
foo()
on stream SideStream - Enqueue an event
E1
on SideStream - Enqueue a wait on event
E1
on MainStream - Enqueue
K
on MainStream
but - what my CUDA scheduling code doesn't have an access to such a foo()
? I want to allow some other, arbitrary place in my code to fire E1 when it is good and ready, and have that trigger K on MainStream. ... but I can't do that, since in CUDA, you can only wait on an already-enqueued (already "recorded") event.
This seems to be one of the few niches in which OpenCL offers a richer API than CUDA's: "User Events". They can be waited upon, and their execution completion status can be set by the user. See:
- https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clCreateUserEvent.html
- https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/clSetUserEventStatus.html
But certainly CUDA is able to provide something like this itself, if only to implement the OpenCL API call. So, what is the idiomatic way to achieve this effect with CUDA?