Is enqueueNDRangeKernel(...) still blocking/synchronous on modern nVidia hardware?

Question

When using OpenCL on many older nVidia cards calls to clEnqueueNDRangeKernel(...) do not return until the computation is complete. See: clEnqueueNDRange blocking on Nvidia hardware? (Also Multi-GPU) .

The OpenCL standard implies that clEnqueueNDRangeKernel(...) should be asynchronous and it is, in fact, a non-blocking function when using the AMD and Intel implementations of OpenCL.

Has this been fixed on more modern nVidia GPGPUs?

I'm using a comment because I don't have proof in front of me, but I believe kernel enqueue on NVIDIA is asynchronous in most all cases. At one point (perhaps still?) NVIDIA's memory functions (clCreateImage2D, clReleaseMemObject) seemed to have some interactions with enqueued commands, so if your test program has dynamic memory usage maybe you are observing synchronous kernel enqueues. If so, try pre-allocating and re-use where possible to reduce that. Finally, if you are enqueueing many kernels you may have hit a queue size limit or something. Can you share code that demos your observation? — Dithermaster, Oct 31 '16 at 13:17
At least on my test case, the kernel call seems to be blocking on the device-side. I cannot get my copies to work in parallel to kernel execution. I haven't tried for multiple-GPU setups so far. On the host, the call returns immediately. — Dschoni, Oct 20 '17 at 09:55

Is enqueueNDRangeKernel(...) still blocking/synchronous on modern nVidia hardware?

0 Answers0