1

I have a C program that uses epoll_ctl.

It waits for connections on a specific port, and when a device connects it determines if the device needs an updated firmware version or not. If it does, it sends the firmware version found on the linux host.

Where I am having trouble is detecting if the device receiving the update disconnects abruptly.

As I need to know this as the server only allows x# of devices to connect and then when the 1 device connects that exeeds this limit I need to restart the service. But only if I know that no devices are getting updates at the time.

I tried checking EPOLLRDHUP but this doesn't seem to work reliably. Any help would be appreciated thanks.

dswatik
  • 9,129
  • 10
  • 38
  • 53
  • 1
    No help. There is no way to reliable check if the TCP peer has disconnected. – SergeyA Aug 16 '17 at 19:18
  • @SergeyA I just can't accept that... there has got to be a way otherwise what good is it? – dswatik Aug 16 '17 at 19:23
  • 2
    Embrace reality, mate. – SergeyA Aug 16 '17 at 19:28
  • @SergeyA Unfortunately I don't think my client will appreciate that answer. – dswatik Aug 16 '17 at 19:30
  • What do you mean with *"disconnects abruptly"*? If the peer does a clean TCP shutdown then you will notice it. If the client system crashed, lost internet connectivity or a firewall lost the state and blocks further data you will only notice if you write data to the peer and get an error after some time because no ACK was received for the data you've send. Or you might use TCP keep alive to check for problems. – Steffen Ullrich Aug 16 '17 at 19:32
  • @SteffenUllrich Yes that is what I mean it lost internet connection... – dswatik Aug 16 '17 at 19:34
  • 3
    Possible duplicate of [How to detect a TCP socket disconnection (with C Berkeley socket)](https://stackoverflow.com/questions/6404008/how-to-detect-a-tcp-socket-disconnection-with-c-berkeley-socket) – Steffen Ullrich Aug 16 '17 at 19:35
  • Joking aside, this is what it is. This is a nature of TCP communication. The best you can do is to make your devices hearbeating while downloading, and set a threshold for X missing heartbeats to consider device dead and forcefully disconnect it. – SergeyA Aug 16 '17 at 19:36
  • 1
    @SteffenUllrich, keep alive is not going to help, since OP is already writing to it. But it is not reliable in any way. – SergeyA Aug 16 '17 at 19:37
  • @SergeyA Correct ... It is in the process of writing and had wrote a packet already to the device and just dropped dead at that point... as there is no logic to know that it dropped, this is what I am currently trying to fix but have not been able to do so reliably. – dswatik Aug 16 '17 at 19:42
  • @SergeyA *This is a nature of TCP communication.* It's the nature of ***all*** communications. How do you know when the entity you're communicating with simply and completely disappears? There is no way to tell. You can only infer the disappearance after some period of time. – Andrew Henle Aug 16 '17 at 19:42
  • @AndrewHenle, I actually wanted to say exactly that, but than decided that I do not want to go into argument about 'but cell phones do detect when you loose connection to your counterparty', and decided to limit my statement to TCP. And there are other arguments as well, which I just don't feel like engaging in. – SergeyA Aug 16 '17 at 19:46
  • @SergeyA LOL :-D If someone wants to make that argument and display their ignorance of both communication theory and cell phone networks, let 'em. Because if you drop that cellphone into a perfect Faraday cage it won't be detecting those lost connections anytime soon. – Andrew Henle Aug 16 '17 at 19:49
  • Well here is the issue I have with that whole cell phone argument ... I don't think they are using TCP? But I don't really know never really researched that so can't say one way or another... I would assume voice over IP does, if not UDP at least.. – dswatik Aug 16 '17 at 19:52
  • 2
    @dswatik There's a whole lot more going on with your cell phone's phone calls than a single, simple TCP connection between you and the person you're talking to. Again, though, if you cut *all* those channels by putting the phone into a Faraday cage, it won't be able to detect that the one call was lost (although given all that the communication channels a cell phone uses, it might very well be designed to handle that situation. How fast can your phone properly detect "No Signal"? And that's likely to be really fast because the timeouts in cell protocols are likely shorter) – Andrew Henle Aug 16 '17 at 19:58
  • @AndrewHenle Yeah I figured it's like comparing apples to oranges ... – dswatik Aug 16 '17 at 20:01
  • (cont) The only thing you can do is select your "we haven't heard from you at all" timeout value so that you balance the risk of false positives for lost connections that were just slow against your desire to detect actual lost connections as fast as possible. I'm guessing the default TCP timeout of 2 minutes is too long? – Andrew Henle Aug 16 '17 at 20:01
  • @AndrewHenle Yeah ... – dswatik Aug 16 '17 at 20:06
  • 1
    @dswatik: Since you are using linux, consider using the linux-specific TCP socket options as described in [man 7 tcp](http://man7.org/linux/man-pages/man7/tcp.7.html) to set the timeouts as needed, and the `TCP_LINGER2` socket option to limit how long the kernel will "remember" a dropped connection, as well as the general socket options described in [man 7 socket](http://man7.org/linux/man-pages/man7/socket.7.html). I personally would use a configuration file to define the options/values used, with a HUP signal telling the service to reload config: these are practical details. – Nominal Animal Aug 17 '17 at 02:25
  • What we ended up doing is kind of a hack but works... I have two things I look at before recycling the app as needed. When a device connects I increment a counter that tells me I have devices downloading. Then as they are downloading I keep updating a time stamp. Then where we end up when we reached the max connections I look at both the counter and time stamp.. If the counter is > 0 I look at the time stamp and each time a device tries to connect when we are at max and the time stamp is more than 2 mins from current time I figure all is good to restart. – dswatik Aug 25 '17 at 16:51

0 Answers0