I am trying to achieve consistent <500 us latency on my ZMQ REQ/REP application. However, I've encountered 2 types of delay.
- First packet delay : It varies between 5-6ms levels.
- Random delays : It happens randomly(every 5-10s) and varies between 1-8 ms.
I can partially understand the first packet delay since initial connection procedures etc. However, I can't understand the random delay. For context on the problem, my application only has 1 server and 1 client. The client and server run on my local. I developed it in C++. I am trying to understand whether REQ/REP is suitable for my case or if I am missing something. I wrote a sample python script to replicate the problem. Python code is given below,
Client
import zmq
import time
from random import randbytes
port = "5556"
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:%s" % port)
val = randbytes(200000)
while True:
st = time.time()
message = socket.send(val)
socket.recv()
ed = time.time()
took = (ed - st)*1000000 # us
if took > 400:
print(took)
Server
import zmq
import time
port = "5556"
context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:%s" % port)
while True:
st = time.time()
message = socket.recv()
socket.send(b"World from")
ed = time.time()
took = (ed - st)*1000000 #us
if took > 200:
print(took)
What I've tried
- different patterns such as lazy pirate etc.
- adding timeouts
- recv pollouts
- increasing number of I/O threads in context
Has anyone encountered something similar?
Note: This problem is not related to hardware. I've tried it on different hardware. If your results are different than mine, please let me know.
Update: I have done the same benchmark with boost asio TCP sockets. It performed very well. The average latency was 14us. Used this repo for the benchmark. Sent the same amount of data. I think it's safe to say that if multicast is not needed, stick to the POSIX sockets.
Update 2: Tested perf tool provided by libzmq. It performed better. The perf tool was measuring the average latency. I added a per-packet latency calculation. For 200Kb data, the average latency was 30us. However, latency was varying between 100us to 2200us(First packet).
Update 3: I have more insights now. This random delay is probably caused by the TCP buffers. Since my packet size is big, it creates latency in the long run.