2

I have a client/server program where I can send a variable amount of packets from the server to the client. /proc/sys/net/ipv4/tcp_retries2 is set to the default of 15, but when I unplug the ethernet cable, the server only sends 7 or 8(it varies) retransmits before it gives up and followed by successive ARP who-has requests..

One suggestion that was made to me was that the reason TCP stops after 7-8 retransmits is that the ARP table entry for this route is expiring before the server reaches the appropriate amount of retries.. I attempted to remedy this by altering /ipv4/route/cg_timeout to be 1500(from 300), but there was no discernible differnence it the programs behaviour..

If anyone could shed light or provide alternative explanations it would be greatly appreciated..

achtung
  • 75
  • 1
  • 5
  • 1
    Thank you for marking my answer. Can I ask you to describe the complete solution? Your question is really interesting and I wonder if the first option I proposed was the complete solution. – rodolk Dec 08 '13 at 23:46
  • I ended up ignoring this issue on the advice of my project supervisor.. fwiw, my supervisors words: (in reference to tcp_retries1, tcp_retries2) "So after 3 retries it should "update the route". In your case, as the destination probably becomes "unreachable" eventually as arp fails (as you have disconnected the server from the network). I see the specific behaviour is undocumented, presumably the 5 extra retries (after the 3) are done before arp is invoked. Or possibly the delay is acused by an arp timer... I wouldn't worry too much about this - as the behaviour is not well documented." – achtung Dec 14 '13 at 15:10

1 Answers1

1

Maybe the entry in the ARP table is expiring and when the ARP requests are sent again there is a timeout for no-reponse? Did you run arp -a? Maybe setting gc_timeout is not enough and you also need to set gc_stale_time? I read at this entry with a great explanation about how it works. The guy was trying to do almost the opposite you are trying. Configuring ARP age timeout

There is another thread to investigate. Maybe you should also change tcp_retries1? Is it possible to change the Retransmission Timeout (RTO)?

Also I looked at Kernel documentation, file ip-sysctl.txt and I got:

tcp_retries1 - INTEGER This value influences the time, after which TCP decides, that something is wrong due to unacknowledged RTO retransmissions, and reports this suspicion to the network layer. See tcp_retries2 for more details. RFC 1122 recommends at least 3 retransmissions, which is the default.

tcp_retries2 - INTEGER This value influences the timeout of an alive TCP connection, when RTO retransmissions remain unacknowledged. Given a value of N, a hypothetical TCP connection following exponential backoff with an initial RTO of TCP_RTO_MIN would retransmit N times before killing the connection at the (N+1)th RTO. The default value of 15 yields a hypothetical timeout of 924.6 seconds and is a lower bound for the effective timeout. TCP will effectively time out at the first RTO which exceeds the hypothetical timeout. RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8.

In other thread I read about socket option TCP_USER_TIMEOUT. I've never use it but it could be an easy solution. Application control of TCP retransmission on Linux

I hope one of these options helps.

Community
  • 1
  • 1
rodolk
  • 5,606
  • 3
  • 28
  • 34