3

Okay, so this seems like a fairly straight forward issue but I feel that I've read every article on socket programming I can find and have not found a satisfactory answer. Let me first describe the system I am programming. I apologize, but I have to be very vague for NDA purposes, but it'll be enough to get my question across.

I'm writing a central multi threaded C server with a thread pool. There are two types of clients, type A and type B. There are thousands of each. Type A are workers which do things for Type B. Type A is constantly updating information about itself to the server (say, every 15 seconds). Type B only talks to the server when it needs something done, at which point the server picks out an A client and assigns it the job. This goes on roughly 24/7, and is very time sensitive.

I've decided to go with a persistent TCP model - this means as soon as B asks for work to be done, the server can immediately send the info over to A, without waiting for the A in question to connect to the server. Furthermore, if every A is talking to the server every 15 seconds, it would be a lot of overhead to keep establishing connections.

If the A chosen by the server is unavailable, it needs to select a new A as soon as possible because B is very impatient.

My question is, how do I tell if the connection has dropped? I'm not talking about a socket being closed, but just no longer connected. For example, B1 wants work done, server selects A1 and sends it the request. However, someone decides to snip the Ethernet cable. I can't afford to have the server happily sending data along to A1 until the connection times out minutes later. Can I ping the client before trying to send it messages or something? Will that introduce way too much latency? What can be done?

theseankelly
  • 51
  • 1
  • 7

1 Answers1

4

In my opinion the only reliable way to do this is with a timer. If it takes too long to reply, then assume that it is disconnected. Kick it into a pool of servers to be fixed, and check every once in a while to see if connectivity is restored.

This is basically how the financial services industry handles market data feeds. If you don't get a response quickly enough, you can no longer trust the sender and should ignore it until things improve. For some applications they even send two identical copies of every packet over two separate network paths (use tunnels, MPLS-TE, two multicast trees) so that they always have a backup packet available if they need it.

In your case, you can probably just choose another worker and send the task to them.

Michael Dillon
  • 31,973
  • 6
  • 70
  • 106
  • 1
    This *is* the only reliable, prompt way to detect missing peers. @theseankelly: In your case, you could require the `A` client return an "OK, starting work" response to the request from the server - if the server doesn't get a timely response, it hands the work to another `A` client and tries to reestablish communication with the timed-out one. – caf Feb 25 '11 at 05:21
  • Okay, so I need to implement some sort of handshake on top of the TCP handshake? Sounds doable. I imagine this would be done like: `send(data); /* now we want a response */ select(socket with timeout); if(timeout){consider client dead and ask another} else go back and continue looping` ? – theseankelly Feb 25 '11 at 12:55
  • Also, why is the approach of shortening the TCP timeout a bad idea (in @rasika 's response) – theseankelly Feb 25 '11 at 13:02
  • @theseankelly If something is wrong with the network, then the other end cannot signal disconnect. Similarly if the other end crashes, there will be no formal disconnect of the TCP socket. – Michael Dillon Feb 25 '11 at 19:26
  • @Michael Dillon Right, which was my original query, but the answer that is right below the chosen one on @rasika's response talks of checking for dead peers through manipulating TCP's keepalive property. The link, again: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html – theseankelly Feb 25 '11 at 20:03
  • I'm still unsatisfied with this answer, so I've unchecked it: In doing more research, I can't understand why implementing TCP Keepalive is NOT the way to do this properly. Any insight? – theseankelly Feb 27 '11 at 01:07
  • hi, can you please help me out on my question. https://stackoverflow.com/questions/74121600/app-stops-working-when-i-keep-sending-multiple-messages-with-socket-io – kd12345 Oct 20 '22 at 07:59