4

We are facing a problem that from some time later, specific socket connection is blocked and tcp kernel of client side keeps retransmitting [ACK] packets.

The topology flow is as below:

   Client A ←→ Switch A ← Router A:NAT ← .. Internet .. 
               → Router B:NAT → Switch B ←→ Server B

Here are the packets captured by WireShark:
A) Server

1. 8013 > 6757 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55     
2. 6757 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0     

B) Client

//lines 3 and 4 are exactly the same as line 1 and 2      
3. 8013 > 13000 [PSH, ACK] Seq=56 Ack=132 Win=5840 Len=55      
4. 13000 > 8013 [ACK] Seq=132 Ack=111 Win=65425 Len=0     
5. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17     

[TCP Retransmission]          
6. 13000 > 8013 [PSH, ACK] Seq=132 Ack=111 Win=65425 Len=17         

8013 is server port and 6757 is client NAT port.

Why does the TCP kernel keep transmitting [ACK] packets to tell the client it receives packet 1 (see packet 4, 5, and 6), even when the server has already received one [ACK] packet (see packet 2)? Neither side of the connection closes the socket when problem happens.

After packet 6, the connection is lost, and we can't send anything to the server via that socket anymore.

         psuedocode:  
         //client
         serverAddr.port =htons(8013) ;
         serverAddr.ip = inet_addr(publicIPB);
         connect(fdA, serverAddr,...);         

         //server
         listenfd = socket(,SO_STREAM,);
         localAddr.port = htons(8013);
         localAddr.ip = inet_addr(INADDR_ANY);
         bind(localAddr...)
         listen(listenfd, 100);

         ...
         //using select model
         select(fdSet, NULL, NULL, NULL);
         for(...)
         {
         if (FD_ISSET(listenfd))
            {
            ...
              }
         ...
         }

UPDATE
UP1. Here are the concrete steps to reproduce the problem

  1. Given three computers which are PC1, PC2 and PC3. All three are behind RouterA while Server is behind RouterB.

  2. Given two users which are U1 and U2. U1 logs in from PC1 and U2 logs in from PC3. Both U1 and U2 will build a tcp connection between itself and the Server. Now U1 is able to send data via its tcp connection to Server, then Server relays all data to U2. Everything works fine until this moment.

    Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-OldSocketFd

  3. Don't log out U1, and unplug the cable of PC1. Then U1 logs in from PC2, now it establishes a new TCP connection to the Server.

    Denote the socket number which corresponds to Server endpoint of the TCP connection between U1 and Server: U1-NewSocketFd

    From Server side, when it updates its Session with U1, it calls close(U1-OldSocketFd).

4.1. About 30 seconds after step 3, we found U1 IS NOT able to send any data to Server via its new TCP connection.

4.2. In step 3, if Server don't call close(U1-OldSocketFd) immediately (the same second new connection between U1 and Server is established), instead, Server calls close(U1-OldSocketFd) in more than 70-80 seconds, then everything works fine.

UP2. Router B uses Port Forwarding on port 8013.
UP3. Some parameters of the Linux OS which Server runs on.

    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
user207421
  • 305,947
  • 44
  • 307
  • 483
Wallace
  • 561
  • 2
  • 21
  • 54
  • 1
    Someone voted this off topic but I feel it is on topic. – Celada Mar 04 '13 at 00:57
  • what are the client NAT ports (those from server point of view) of U1 from PC1 and U1 from PC2? Just give example from one trial. And what are the internal IPs of PC1 and PC2, and the internal client ports? – Tomas Jan 12 '14 at 07:17
  • @Tomas The clients' NAT ports are allocated by NAT and seem to be random, in my example RouterA allocates the port number --6757 -- for the first connection between U1 and Server. The internal client ports are usually 13000, and if this port number is in use (by other application) the client tries to bind the next number which is 13001, if the port is still in use then 13002, 13003.. – Wallace Jan 12 '14 at 07:37
  • Steve, I want to see all of these numbers from one trial. It is important for the diagnostic. Include the local IPs please. – Tomas Jan 12 '14 at 07:39
  • @Tomas I will update this post later, adding the ports and ips for every tcp connection. I don't have them at this time. – Wallace Jan 12 '14 at 07:44
  • OK. One more question - what happens if you don't unplug the cable from PC1? – Tomas Jan 12 '14 at 07:45
  • I remember everything remains OK when cable of PC1 stays connected. If U1 logs in from PC2 while it still didn't get logged out from PC1, then from Server's perspective, U1 is doing re-logging, then it does the same thing which is call close(U1-OldSocketFd) after sending messages to U1-OldSocketFd. But in this condition no error was detected. – Wallace Jan 12 '14 at 07:53
  • Steve, you said you will update the post later with more information. I don't think anyone can answer without it. See my posts above. – Tomas Jan 16 '14 at 20:53
  • Segments 6 and 7 are sending 17 bytes of data. The other end is not ACKing them. The ack flag is (usually/always) on wherever possible, even when sending data. If the over end does not react to data/ack then it is resent. – ctrl-alt-delor Jan 18 '14 at 14:37

2 Answers2

1

After packets 1 (same as 3) and 2 (same as 4) have gone by, your client seems to be transmitting 17 bytes of data to the server (packet 5). I don't know how much later packet 5 comes after the first exchange of packets so I don't know after how much time this happens. Your pseudocode doesn't clarify it because it just shows the socket initialization, it doesn't show which side attempts to transmit what data at what time. A ladder diagram might be useful in this instance to represent your protocol exchanges.

In any case, the server apparently doesn't acknowledge the 17 bytes of data so they are transmitted again (packet 6).

Unless you have some problem with the network or with a firewall or NAT router or something else dropping packets, there shouldn't be any reason why the server is able to receive the earlier parts of the TCP exchange but apparently cannot receive packets 5 or 6. Once again, is there a large amount of time elapsed between the prior exchange of data and packet 5 (such as, enough time for a NAT router, firewall, or load balancer to expire the connection)?

Celada
  • 21,627
  • 4
  • 64
  • 78
1

Based on your steps to reproduce the issue and UPD3, it may be due to

net.ipv4.tcp_tw_recycle = 1

The reason is that the kernel is trying to recycle a TIME_WAIT connection before due time (thanks to tw_recycle).

This answer explains how tw_reuse and tw_recycle behave (NAT section is of interest here).

According to the steps to reproduce and observations 4-1 and 4-2, when you immediately call fclose() the connection enters TIME_WAIT state, from where tw_recycle can take on and assume that since this side has closed the connection, the socket can be recycled. Since the connection comes from the same host from the server's point of view, tw_recycle kicks in.

When you instead wait before calling fclose(), since no disconnect is triggered from the server's POV, it will assume that the connection is still alive, which prevents tw_recycle from kicking in, possibly/probably forcing the creation of a brand new connection.

According to 1, to be safe from protocol POV, you have 2 cases:

  • Disable both tw_reuse and tw_recycle
  • Enable tw_reuse, enable TCP timestamps, disable tw_recycle

tw_recycle will probably always trigger the no-connectivity condition, given your network topology.

Community
  • 1
  • 1
  • so you think that the connection made from PC2 will have the same port on PC2 as the connection made from PC1 (broken by cable unplug)? – Tomas Jan 18 '14 at 07:43
  • As far as I understand (also by reading the referenced answer), the server will try to recycle the TIME_WAIT socket connection, seeing another connection coming from the same source IP (because the server is behind a NAT) to the same destination IP:port. The kernel expects timestamps to increase, but the NAT randomly selects the timestamps, hence the timeout.Honestly, I cannot tell/understand if the source port matters. – Fabio Scaccabarozzi Jan 19 '14 at 18:37
  • On second thought: source port should not matter, otherwise recycling would never happen because you'd be expecting **any** connection to your destination port to always originate from the same source port (which would be equivalent to filtering by source port). – Fabio Scaccabarozzi Jan 20 '14 at 08:35