Is there any utility like tcpdump in Linux for capturing the traffic which is going over RDMA channel? (Infiniband/RoCE/iWARP)
-
How do you solve this problem finally? – Djvu Dec 05 '14 at 13:37
-
1ibdump worked for me as suggested by @kliteyn What kind of packets are you looking for? I was doing RMDA_WRITE_WITH_IMMIDIATE and I could see all the packets. – dhavale Dec 09 '14 at 09:38
-
1I just want to justify whether there is some RDMA packet is out-going to the network. But when I use the ibdump, I captured only a very little packet, such as 2 packets showed by ibdump. but I send a lot of data. I also wonder the meaning of the packets captured by ibdump, is it just for connection setup, not contain the data send out? – Djvu Dec 09 '14 at 09:57
-
I have another question. the packet number captured by the ibdump will increase after I end of sending data. so what is meaning for? is ibdump response slow? not like the tcpdump, which show the real time. – Djvu Dec 09 '14 at 09:59
6 Answers
Old thread, but still:
As Roland pointed out, sniffing RDMA traffic is tricky, because once the endpoints did the initial handshake, traffic goes through network card (HCA) directly to the memory. The only way to sniff this traffic w/o putting a dedicated HW sniffer on the wire is to have vendor-specific hooks in the network card, and a SW tool that uses these hooks.
If you have Mellanox HCAs, you can use the "ibdump" tool. This tool is also a part of Mellanox OFED package.
If you have other vendor's HW, you need to check with that vendor - you won't find any open-source packet sniffer for all RDMA-capable devices, sorry.

- 1,917
- 11
- 24
-
I think your answer fits the best. I have learned that each vendor has to make utility available for packet capture on their HCA. I am currently only dealing with Mellanox HCA and you are right, "ibdump" is the answer for this. I have tried it now and it does the capture. However I have found out that it logs only RDMA operation headers and not the payload itself. I don't know if thats the behaviour by default or I need to upgrade my packages. But in essence, "ibdump" works and it is what I was looking for when I asked the question. Thank you! – dhavale Dec 07 '12 at 21:54
-
@kliteyn But why the packet captured by ibdump is so little, I send a lot of packet, but it only caputured a little packets. – Djvu Dec 05 '14 at 08:59
-
@Djvu The [ibdump readme](https://github.com/Mellanox/ibdump#3-known-issues) lists some limitations - such as dropping packets during bursts. And, in contrast to tcpdump, it doesn't report how many packets it dropped during capturing. Your ibdump might be overwhelmed by the number of packets to be captured. – maxschlepzig Jan 22 '21 at 21:51
In general, no. One of the main characteristics of RDMA is that all the network processing is done on the adapter, without involving the CPU at all. Typically work requests are queued up directly from userspace to the adapter, without any system call. So there's nowhere for a sniffer to hook in to get traffic.
With that said, for Ethernet protocols, iWARP or IBoE (aka RoCE), you can hook up a system in the middle of a connection and set it up to do forwarding in software (eg the Linux bridge module) and then run tcpdump or wireshark to capture the RDMA traffic that passes through this system. Wireshark even has dissectors for iWARP and IBoE.
For native InfiniBand it is theoretically possible to build something similar (set up an adapter to capture and forward traffic) but as far as I know, no one has done even the needed firmware or driver work to do basic packet sniffing.

- 6,227
- 23
- 29
-
Thank you Roland for your input! I'll explore using Linux bridge to sniff in. I understood that packets are queued up directly from userspace and thats why there is no place to intercept. I am using ib_post_send() from kernel to queue the work requests, so I thought there could be some place inside the implementation to know that packet was send to other node. I don't know if this is possible without a firmware support, may be when you get a event on CQ? The main reason for this question is when I do not see the data in receiver, we need a way to tell which RNIC is at fault, sender or reciver? – dhavale Oct 08 '12 at 17:54
Chelsio's T4 device supports a packet trace feature allowing it to replicate ingress/egress offload packets to one of the device's NIC queues. Then you can use tcpdump or whatever on that ethX interface to see the RDMA or TOE packets.

- 31
- 1
-
Thank you Steve! I will check with my hardware vendor (Mellanox) if they support something similar. – dhavale Oct 08 '12 at 17:42
Wireshark can be the one. But the problem is you need an observing server. Enabling the mirror feature, you should be able to receive the ROCE pocket at the observer.

- 330
- 3
- 13
A sure way to capture such traffic is to duplicate it into dedicated capture ports. Those ports might be additional ethernet/IB ports (of additional adapters) in your development machine or they may be located in an additional capture machine.
There are basically 2 ways how to duplicate the traffic:
Configure port-mirroring in your switch. Support for port mirroring is pretty common in managed Ethernet switches, even in cheap ones. This feature is also available in some Mellanox Infiniband switches. You can configure to mirror both directions of a port into another one, although this oversubscribes the receiver if the mirrored port receives and sends at line speed at the same time (full-duplex). In such a situation some frames can't be forwarded to the capture port then and are thus dropped. To avoid this limitation one needs to mirror each direction into a separate capture port.
Connect your network cable to a TAP (target access point) device that duplicates or splits the signal. With optical networking those TAPs are often constructed in a completely passive way and thus don't add much complexity and are relatively cheap to produce (examples). You need one TAP for each fiber, i.e. you always occupy 2 capture ports if you want to capture both directions. TAP devices are available for the fibers and connectors commonly used in Ethernet networks. If your Infiniband hardware uses the same then you should be able to use the same TAP devices there, as well. At least the passive ones.
Once the mirrored/tapped traffic arrive at your capture port(s), you can use standard capture tools such as tcpdump.
For Infiniband there is ibdump, however, depending on the Infiniband software you are using (open-source OFED vs. the proprietary Mellanox OFED) and the host channel adapter (HCA) you might be able to use tcpdump to capture Infiniband traffic, as well.

- 35,645
- 14
- 145
- 182
As I'm writing this answer is now possible to sniff network using tcpdump with a recent linux kernel or by installing Mellanox OFED (Nvidia) for older versions.
HOW-TO DUMP RDMA TRAFFIC USING THE INBOX TCPDUMP TOOL (CONNECTX-4 AND ABOVE)
After installing the Mellanox OFED (if needed) you can generate a pcap
file and analyze it later by opening the pcap file in Wireshark.
tcpdump -i mlx5_1 -s 65535 -w rdma_traffic.pcap
Make sure to use mlx5_X available interfaces.

- 2,013
- 2
- 21
- 42