I want to develop an application with OpenCL to run on multiGPU. At some point, data from one GPU should be transferred to another one. Is there any way to avoid transferring through host. This can be done on CUDA via cudaMemcpyPeerAsync
function. Is there any function similar to it in OpenCL?

- 381
- 1
- 3
- 16
-
Have you tried creating an OpenCL context with multiple devices? – pmdj Nov 30 '20 at 13:12
1 Answers
In OpenCL, a context is treated as a memory space. So if you have multiple devices associated with the same context, and you create a command queue per device, you can potentially access the same buffer object from multiple devices.
When you access a memory object from a specific device, the memory object first needs to be migrated to the device so it can physically access it. Migration can be done explicitly using clEnqueueMigrateMemObjects.
So a sequence of a simple producer-consumer with multiple devices can be implemented like so:
command queue on device 1:
- migrate memory buffer1
- enqueue kernels that process this buffer
- save last event associated with buffer1 processing
command queue on device 2:
- migrate memory buffer1 - use the event produced by queue 1 to sync the migration.
- enqueue kernels that process this buffer
How exactly migration occurs under the hood I cannot tell, but I assume that it can either be DMA from device 1 to device 2 or (more likely) DMA from device 1 to host and then host to device 2.
If you wish to avoid the limitation of using a single context or would like to insure the data transfer is efficient, then you are at the mercy of vendor-specific extensions.
For example, AMD offers DirectGMA technology that allows explicit remote DMA between GPU and any other PCIe device (including other GPUs). From experience it works very nice.

- 3,703
- 3
- 20
- 37
-
Miamoni Thanks for your complete answer. I almost knew the process how this migration should be done. The point is that I did not know whether this migration is done through host memory, which can be quite expensive. And this is the price of using general framework (here OpenCL) that should be paid. – mehdi_bm Dec 01 '20 at 08:19