multi-gpu cuda: Run kernel on one device and modify elements on the other?

Question

Suppose I have multiple GPU's in a machine and I have a kernel running on GPU0.

With the UVA and P2P features of CUDA 4.0, can I modify the contents of an array on another device say GPU1 when the kernel is running on GPU0?

The simpleP2P example in the CUDA 4.0 SDK does not demonstrate this.

It only demonstrates:

Peer-to-peer memcopies
A kernel running on GPU0 which reads input from GPU1 buffer and writes output to GPU0 buffer
A kernel running on GPU1 which reads input from GPU0 buffer and writes output to GPU1 buffer

Are you trying to ask whether a kernel on one GPU can write to global memory which is physically on a different GPU via UVA and P2P? — talonmies, Feb 10 '12 at 18:20

score 3 · Answer 1 · answered Feb 13 '12 at 03:18

Short answer: Yes, you can.

The linked presentation gives full details, but here are the requirements:

Must be on a 64-bit OS (either Linux or Windows with the Tesla Compute Cluster driver).
GPUs must both be Compute Capability 2.0 (sm_20) or higher.
Currently the GPUs must be attached to the same IOH.

You can use cudaDeviceCanAccessPeer() to query whether direct P2P access is possible.

1 Answers1