Suppose I have multiple GPU's in a machine and I have a kernel running on GPU0.
With the UVA and P2P features of CUDA 4.0, can I modify the contents of an array on another device say GPU1 when the kernel is running on GPU0?
The simpleP2P example in the CUDA 4.0 SDK does not demonstrate this.
It only demonstrates:
- Peer-to-peer memcopies
A kernel running on GPU0 which reads input from GPU1 buffer and writes output to GPU0 buffer
A kernel running on GPU1 which reads input from GPU0 buffer and writes output to GPU1 buffer