I have a Dotnet application (running on a Windows PC) which communicates with a Linux box via OPC UA. The use case here is to make ~40 read requests to the server in serial. Once these 40 read calls are complete, the next cycle of 40 read calls begins. Each read call returns a response from the server carrying a payload of ~16KB which is fragmented and delivered to the client. For most requests, the server finishes delivering the complete response within 5ms. However for some requests it takes ~300 ms to complete.
In scenarios where this delay exists, I can see the following pattern of re-transmissions.
- [71612] A new Read request is sent to the server.
- [71613-71630] The response is delivered to the client.
- [71631] A new Read request is sent to the server.
- [71632] A TCP Spurious Retransmission occurs from the server for packet [71844] with Seq No. 61624844
- [71633] Client sends a DUP ACK for the packet.
- [71634] Client does a TCP Retransmission for the read request in [71846] after 288ms
This delay adds up and causes some 5-6 seconds of delay for a complete cycle of 40 requests to complete. I want to figure out what is causing these retransmissions (hence delays) and what can possibly be done to-
- Reduce the frequency of retransmissions.
- Reduce the 300ms delay from the client side to quickly retransmit the obstructed read request.
I have tried disabling the Nagle algorithm on the server to possibly improve performance but it did not have any effect. Also, when reducing the response size by half (8KB), the retransmissions are rare and hence the delay is minute as well. But reducing the response is not a valid solution in our use case.
The connection to the Linux box is through a switch, however while directly connecting to it point-point, there is marginal reduction in the delay.
I can share relevant code but I think this issue is likely with the TCP stack (or at least, some configuration that should be enabled?) hence it would make little difference.