54

I open a TCP socket and connect it to another socket somewhere else on the network. I can then successfully send and receive data. I have a timer that sends something to the socket every second.

I then rudely interrupt the connection by forcibly losing the connection (pulling out the Ethernet cable in this case). My socket is still reporting that it is successfully writing data out every second. This continues for approximately 1hour and 30 minutes, where a write error is eventually given.

What specifies this time-out where a socket finally accepts the other end has disappeared? Is it the OS (Ubuntu 11.04), is it from the TCP/IP specification, or is it a socket configuration option?

oggmonster
  • 4,672
  • 10
  • 51
  • 73
  • 3
    Maybe [this](http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux) gives answer for you. – SKi Oct 26 '12 at 11:53

3 Answers3

81

Pulling the network cable will not break a TCP connection(1) though it will disrupt communications. You can plug the cable back in and once IP connectivity is established, all back-data will move. This is what makes TCP reliable, even on cellular networks.

When TCP sends data, it expects an ACK in reply. If none comes within some amount of time, it re-transmits the data and waits again. The time it waits between transmissions generally increases exponentially.

After some number of retransmissions or some amount of total time with no ACK, TCP will consider the connection "broken". How many times or how long depends on your OS and its configuration but it typically times-out on the order of many minutes.

From Linux's tcp.7 man page:

   tcp_retries2 (integer; default: 15; since Linux 2.2)
          The maximum number of times a TCP packet is retransmitted in
          established state before giving up.  The default value is 15, which
          corresponds to a duration of approximately between 13 to 30 minutes,
          depending on the retransmission timeout.  The RFC 1122 specified
          minimum limit of 100 seconds is typically deemed too short.

This is likely the value you'll want to adjust to change how long it takes to detect if your connection has vanished.

(1) There are exceptions to this. The operating system, upon noticing a cable being removed, could notify upper layers that all connections should be considered "broken".

Brian White
  • 8,332
  • 2
  • 43
  • 67
  • 1
    I daily make Thousands of Http Requests, Now I am working at Websockets(nodejs), So naturally I wanted to know more about good Old sockets, I now realise How Beautifully SocketsWork. reliable, Connection-oriented and Host specific Protocol because thats why no. of retries depend on OS! – Karan Kaw Jun 15 '17 at 04:30
  • Can you also mention the fill rate of recv/send-buffer sizes? If the send-buffer fills up, the application may recieve i/o errors. – huch Apr 17 '18 at 10:41
  • 1
    Applications will **not** receive I/O errors. Receivers will block or read zero bytes once their "receive windows" are empty, depending on whether the read is synchronous or asynchronous. Senders will block or send zero bytes once their "send windows" (and any other OS buffers) are full, also depending. It isn't until TCP determines the connection to be "broken" that I/O errors will occur. – Brian White Apr 20 '18 at 17:38
1

If want a quick socket error propagation to your application code, you may wanna try this socket option:

TCP_USER_TIMEOUT (since Linux 2.6.37) This option takes an unsigned int as an argument. When the value is greater than 0, it specifies the maximum amount of time in milliseconds that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application. If the option value is specified as 0, TCP will use the system default.

See full description on linux/man/tcp(7). This option is more flexible (you can set it on the fly, just right after a socket creation) than tcp_retries2 editing and exactly applies to a situation when you client's socket doesn't aware about server's one state and may get into so called half-closed state.

Alex-Bogdanov
  • 2,172
  • 1
  • 19
  • 20
0

Two excellent answers are here and here.
TCP user timeout may work for your case: The TCP user timeout controls how long transmitted data may remain unacknowledged before a connection is forcefully closed.
there are 3 OS dependent TCP timeout parameters. On Linux the defaults are: tcp_keepalive_time default 7200 seconds
tcp_keepalive_probes default 9
tcp_keepalive_intvl default 75 sec
Total timeout time is tcp_keepalive_time + (tcp_keepalive_probes * tcp_keepalive_intvl), with these defaults 7200 + (9 * 75) = 7875 secs
To set these parameters on Linux:
sysctl -w net.ipv4.tcp_keepalive_time=1800 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=20

JayS
  • 2,057
  • 24
  • 16