2

I'm encountering a problem which is very strange to me

I have a c++ application(server) deployed on centos and on the client side(also runs on centos), there is a program that will connect the server through timer so that when the number of connections reaches 1k, the timer stops.

I'm able to run the following command to detect connections on the server:

netstat -nat |grep -i "port"| grep "ESTABLISHED"

It works decent up for now. However, after I killed the process from client,There was still a significant number of connections in ESTABLISHED STATUS on the server. And even I shut down the client machine,I was still able to see a lot of active connections in the status of ESTABLISHED on the server after more than 10 hours morning the next day.

Even though there could be packet loss when I killed the process so that it failed to notify the server that TCP connection is closed,I believe there is a default heartbeat(keepalive) mechanism within TCP that is able to check if connection is alive.

Is it reliable to get number of connections through the command mentioned above, otherwise what could be going amiss that the server doesn't release closed connections ?

BRYAN
  • 51
  • 4
  • netstat should be fine, and yes TCP should timeout and close well before 10 hours. does netstat show it's connected to what was previously the client address? what's the server doing with the connections... is it parked on a select or a recv? – mark Sep 10 '13 at 15:44
  • @mark Why? If the server isn't trying to send and it doesn't have a read timeout, what is there to time out? – user207421 Sep 11 '13 at 03:07
  • @EJP you are correct... I never use the stack's keepalive mechanism opting instead for application-level control of the connection timing via application-layer keepalives and select/recv timeouts... I never even realized the stack's was not enabled by default. – mark Sep 11 '13 at 12:10
  • @mark Once a persistent connection is accepted and established, the server does nothing more than relay the packet from a different server(a different irrelevant connection) to the client. Client keeps sending request to get the packet. There is nothing particularly different from ordinary C/S framework about what the server does. However these are concurrent connections(more than 50k) that are set up instantaneously. I tried with smaller amount of concurrent requests, there're still connections that failed to close, just a smaller amount though. could this be the cause for it ? – BRYAN Sep 11 '13 at 14:31
  • I assume you have increased your file descriptor limit in order to handle that many simultaneous connections? Does it all work fine on, say, 500 simultaneous connections? – mark Sep 11 '13 at 14:55

2 Answers2

0

The default values for TCP keepalive are around 2 hours (in BSD/Linux implementations). Are you sure you have TCP keeaplives options set when you are seeing connectoins still up after 10 hours? I am thinking that your application perhaps is not explicitly setting the keepalive option. One way would be to use get socket option and pass SO_TCPKEEPALIVE to check if the keepalive is indeed set. And if it is not set, then please go ahead and set it.

You might find this discussion helpful: How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?

Community
  • 1
  • 1
Manoj Pandey
  • 4,528
  • 1
  • 17
  • 18
0

Keepalive by default is off. You have to enable it, in your case at the server end, per socket.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • Sure the keepalive option is off as default and connection should close within 2 hours. I had enabled the option and what really sounds bizarre is there are a few connections in ESTABLISHED status even after the client has been shut down for entire night – BRYAN Sep 11 '13 at 14:11
  • That doesn't make sense. If keepalive is off by default, the connection *won't* 'close within two hours'. You have to *enable* keepalive on the socket concerned to get that behaviour. – user207421 Sep 11 '13 at 23:33