I have a problem with something and I'm guessing it's the code.
The application is used to 'ping' some custom made network devices to check if they're alive. It pings them every 20 seconds with a special UDP packet and expects a response. If they fail to answer 3 consecutive pings the application sends a warning message to the staff.
The application is running 24/7 and for a random number of times a day (2-5 mostly) the application fails to receive UDP packets for an exact time of 10 minutes, after which everything goes back to normal. During those 10 minutes only 1 device seems to be replying, others seem dead. That I've been able to deduce from the logs.
I've used wireshark to sniff the packets and I've verified that ping packets are going both out AND in, so the network part seems to be working okay, all the way to the OS. The computers are running WinXPPro and some have no configured firewall whatsoever. I'm having this issue on different computers, different windows installs and different networks.
I'm really at a loss as to what might be the problem here.
I'm attaching the relevant part of the code which does all the network. This is run in a separate thread from the rest of the application.
I thank you in advance for whatever insight you might provide.
def monitor(self):
checkTimer = time()
while self.running:
read, write, error = select.select([self.commSocket],[self.commSocket],[],0)
if self.commSocket in read:
try:
data, addr = self.commSocket.recvfrom(1024)
self.processInput(data, addr)
except:
pass
if time() - checkTimer > 20: # every 20 seconds
checkTimer = time()
if self.commSocket in write:
for rtc in self.rtcList:
try:
addr = (rtc, 7) # port 7 is the echo port
self.commSocket.sendto('ping',addr)
if not self.rtcCheckins[rtc][0]: # if last check was a failure
self.rtcCheckins[rtc][1] += 1 # incr failure count
self.rtcCheckins[rtc][0] = False # setting last check to failure
except:
pass
for rtc in self.rtcList:
if self.rtcCheckins[rtc][1] > 2: # didn't answer for a whole minute
self.rtcCheckins[rtc][1] = 0
self.sendError(rtc)