RDMA read and write data placement/visibility semantics

Question

I am trying to get more details on the RDMA read and write semantics (especially data placement semantics) and I would like to confirm my understanding with the experts here.

RDMA read :

Would the data be available/seen in the local buffer, once the RDMA read completion is seen in the completion queue. Is the behavior the same, if I am using GPU Direct DMA and the local address maps to GPU memory. Would the data be immediately available in GPU, once the RDMA READ completion is seen in completion queue. If it is not immediately available, what operation will make ensure it.

RDMA Write with Immediate (or) RDMA Write + Send:

Can the remote host check for presence of data in its memory, after it has seen the Immediate data in receive queue. And is the expectation/behavior going to change, if the Write is to GPU memory (using GDR).

score 1 · Accepted Answer · answered Oct 30 '21 at 11:30

RDMA read. Would the data be available/seen in the local buffer, once the RDMA read completion is seen in the completion queue?

Yes

Is the behavior the same, if I am using GPU Direct DMA and the local address maps to GPU memory?

Not necessarily. It is possible that the NIC has sent the data towards the GPU, but the GPU hasn't received it yet. Meanwhile the RDMA read completion has already arrived to the CPU. The root cause of this is PCIe semantics, which allow reordering of writes to different destination (CPU/GPU memory).

If it is not immediately available, what operation will make ensure it?

To ensure data has arrived to the GPU, one may set a flag on CPU following the RDMA completion and poll on this flag from GPU code. This works because the PCIe read issued by the GPU will "push" the NIC's DMA writes (according to PCIe ordering semantics).

RDMA Write with Immediate (or) RDMA Write + Send: Can the remote host check for presence of data in its memory, after it has seen the Immediate data in receive queue. And is the expectation/behavior going to change, if the Write is to GPU memory (using GDR).

Yes, this works, but GDR suffers from the same issue as above with writes arriving out-of-order to GPU memory as compared to CPU memory, again due to PCIe ordering semantics. The RNIC cannot control PCIe and therefore it cannot force the "desired" semantics in either case.

Thanks. One doubt. >To ensure data has arrived to the GPU, one may set a flag on CPU following the RDMA completion and poll on this flag from GPU code. Consider the below sequence. 1) RDMA read initiated. 2) Polling the flag from GPU 3) RDMA completion reaches CPU. 4) CPU sets the flag. 5) GPU sees the flag.. I understand that PCIe reads will flush all the previous DMA writes, but unless and until you start the GPU poll only after the RDMA completion, it might work. Otherwise, there is always a timing issue, where it might not work. Is that understanding correct ? — user718134, Nov 02 '21 at 20:51
It depends. In the above explanation, I've assumed that the memory on the CPU is zero, and so polling it will only generate the flag when data arrives. Obviously, care must be taken to ensure this indeed holds. — Boris Pismenny, Nov 06 '21 at 12:05

RDMA read and write data placement/visibility semantics

1 Answers1