I have a strange problem as follows.
I am preparing a Device Client with Python 2.7. There are Tracking devices (amount ~1100 active and running) which sends signals to server. They have periodic signal that is sent once every hour. (Signal sent frequency is changing according to situation but they must sent at least one GPS Position Data signal once every hour)
Those devices are running in long connection mode, that means a connection initiated by the device should be alive for 3-4 hours. For keeping this connection alive,thy sent Heart Beat Signals (they are not GPS position signals, but they are signals that contains some data). Heartbeat signal interval is 15 minutes.
Below is my script for listening a TCP port
class Server(object):
def __init__(self, host, sock_port, buffsize=1024):
self.hostname = host
self.sock_port = sock_port
self.buffsize = buffsize
self.socket = None
def start(self):
self.log.info("Listening: ")
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind((self.hostname, self.sock_port))
self.socket.listen(1024)
while True:
conn, address = self.socket.accept()
thread.start_new_thread(GV55LiteHandler(conn=conn, buff_size=self.buffsize).handle_data, ())
This is the method that is called when Socket server receives a new connection:
class GV55LiteHandler():
....
def handle_data(self):
while True:
try:
_veri = self.conn.recv(self.buff_size)
if not _veri:
# We do not recieve any data...
raise NoIncomingDataException()
except NoIncomingDataException:
break
except Exception as h_e:
print h_e
break
else:
self.control_data(_veri)
self.conn.close()
After a while, I check (using psutil) the number of threads of the process and see the total number of threads are greater than 5.000. I evaluate this as some devices have dead connections that looks like still active, but dropped by the device and a new connection is established. Considering the total number, each device looks like created 4 connections, closed them when the long connection time is over (set within the device) and established a new connection. That is said to be normal in some situations and have no effect. But after a while, I get reports that some devices could not connect! Then I kill the Port listening script and re-start it and within 10 minutes, all devices that could not connect and sent data starts to sent data again. Have some research over this but can not find anything about the situation. My best guess is, after a device established too many connections (I have a similar tracking device with a different manufacturer which I have ~120 active devices and see a total running threads of 1600, which means each devices established and fail to drop 10 previous connection, and then establish a brand new one like the previous ones) the server do not accept any new connection from that device, or the device fails to create a new TCP connection to server and GPS data is not sent until script is restarted and all connections are dropped.
These tracking devices are running on single data connection. That means, no device can have 2 active data connections and sent data using both (this is meaningless too).
I tried to set TCP connection time out to TCP connection as below:
conn, address = self.socket.accept()
conn.settimeout(10800)
and handle this in the data processing script:
try:
_veri = self.conn.recv(self.buff_size)
if not _veri:
# We do not recieve any data...
raise NoIncomingDataException()
except NoIncomingDataException:
# No need to log anything in here...
break
except socket_timeout:
print "Socket Timeout"
break
That seems to work and now I do not have any device that could not sent GPS data. But on the other hand, conn.settimeout
is not setting the connection timeout properly, and after a while, after 30 seconds of the last signal, the connection is timeout
by the conn.settimeout
. I expect it to set the timeout to 3 hours but it fails and the connection is dropped after ~20 minutes and a new Heart Beat signal is sent to open the new connection, followed by the GPS position signal. GPS signal must be sent once every hour but when settimeout
is defined, I received that signal once every 20 minutes.
I use blocking socket
s (the default socket behaviour). Do not try non-blocking sockets (and do not have much knowledge of them too).
How can I get rid of inactive connections that causes devices not to sent data without breaking the long-connection mode
of the devices?
Update: I never hit NoIncomingDataException
in the handle_data
method in both settimeout
version and no-timeout version.
Update 2: I have Debian GNU/Linux 6.0.10 in my server.
My /etc/sysctl.conf
configuration:
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_ecn = 0
Above python lines are the onlyones that configure socket, hence I only have setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
. I do not have any configuration for socket.SO_KEEPALIVE
in the python script.