I'm just starting to use Julia's CUDArt package to manage GPU computing. I am wondering how to ensure that if I go to pull data from the gpu (e.g. using to_host()
) that I don't do so before all of the necessary computations have been performed on it.
Through some experimentation, it seems that to_host(CudaArray)
will lag while the particular CudaArray is being updated. So, perhaps just using this is enough to ensure safety? But it seems a bit chancy.
Right now, I am using the launch()
function to run my kernels, as depicted in the package documentation.
The CUDArt documentation gives an example using Julia's @sync
macro, which seems like it could be lovely. But for the purposes of @sync
I am done with my "work" and ready to move on as soon as the kernel gets launched with launch()
, not once it finishes. As far as I understand the operation of launch()
- there isn't a way to change this feature (e.g. to make it wait to receive the output of the function it "launches").
How can I accomplish such synchronization?