How do I recover when a synchronous call to socket send() gets blocked due to the loss of the other end of the connection?

Question

When my socket connection is terminated normally, then it works fine. But there are cases where the normal termination does not occur and the remote side of the connection simply disappears. When this happens, the sending task gets stuck in send() because the other side has stopped ack'ing the data. My application has a ping request/response going on and so, in another thread, it recognizes that the connection is dead. The question is...what should this other thread do in order to bring the connection to a safe termination. Should it call close()? I see SIGPIPE thrown around when this happens and I just want to make sure I am closing the connection in a safe way. I just don't want it to crash...I don't care about the leftover data. I am using a C++ library that is using synchronous sockets, so moving to async is not an easy option for me.

If the socket gets closed the socket in another thread, then the original thread will get quite confused. Meticulous error handling should prevent that, of course, but it's simpler to just have the main thread timeout the send. — Sam Varshavchik, Jan 28 '20 at 15:15
Ignore the `SIGPIPE` signal and `send` should return with `-1` and `errno` set to `EPIPE`. — Some programmer dude, Jan 28 '20 at 15:15
The problem with closing the socket in the other thread is that if the socket might also be closed by the network thread, that can lead to a race condition where both threads call `close()` with the same value -- usually that just causes the second `close()` call to error out, but if there is a third thread that called `socket()` between those two `close()` calls, and was handed that same descriptor value (because having been closed(), that fd value is available for re-use), then you've got a real fun bug to track down--a completely unrelated thread occasionally erroring out for 'no reason'. — Jeremy Friesner, Jan 28 '20 at 15:17
@JeremyFriesner Good point. I only have a receive thread and a send thread that touch the socket in my application, so I hope I can trust no third thread will intervene. — Chris, Jan 28 '20 at 21:24
Consider using the `shutdown` system call rather than `close`. This prevents the particular race @JeremyFriesner mentioned as it ensures the `send` "completes with error" without invalidating the socket file descriptor. Then the actual `close` can be performed by your sending thread. (Bear in mind that there are other races still possible if you do not appropriately synchronize access to the variable holding the file descriptor value.) — Gil Hamilton, Jan 29 '20 at 00:29
The shutdown works. My app is multi-platform and I didn't see an obvious windows version of "shutdown", so I am using closesocket(h) there. — Chris, Feb 01 '20 at 18:09

Jeremy Friesner · Answer 1 · 2020-01-28T16:59:20.390

I avoid this problem by setting setting SIGPIPE to be ignored, and setting all my sockets to non-blocking I/O mode. Once a socket is in non-blocking mode, it will never block inside of send() or recv() -- rather, in any situation where it would normally block, it will instead immediately return -1 and set errno to EWOULDBLOCK instead. Therefore I can never "lose control" of the thread due to bad network conditions.

Of course if you never block, how do you keep your event loop from spinning and using up 100% of a core all the time? The answer is that you can block waiting for I/O inside of a separate call that is designed to do just that, e.g. select() or poll() or similar. These functions are designed to block until any one of a number of sockets becomes ready-to-read (or optionally ready-for-write) or until a pre-specified amount of time elapses, whichever comes first. So by using these, you can have your thread wake up when it needs to wake up and also sleep when there's nothing to do.

Anyway, once you have that (and you've made sure that your code handles short reads, short writes, and -1/EWOULDBLOCK gracefully, as those happen more often in non-blocking mode), you are free to implement your dead-network-detector in any of several ways. You could implement it within your network I/O thread, by keeping track of how long it has been since any data was last sent or received, and by using the timeout argument to select() to cause the blocking function to wake up at the appropriate times based on that. Or you could still use a second thread, but now the second thread has a safe way to wake up the first thread: by calling pipe() or socketpair() you can create a pair of connected file descriptors, and your network I/O thread can select()/poll() on the receiving file descriptor while the other thread holds the sending file descriptor. Then when the other thread wants to wake up the I/O thread, it can send a byte on its file descriptor, or just close() it; either one will cause the network I/O thread to return from select() or poll() and find out that something has happened on its receiving-file-descriptor, which gives it the opportunity to react by exiting (or taking whatever action is appropriate).

I use this technique in almost all of my network programming, and I find it works very well to achieve network behavior that is both reliable and CPU-efficient.

score 0 · Answer 2 · answered Jan 28 '20 at 15:19

0

I had a lot of SIGPIPE in my application. Those are not really important: they just tells you that a Pipe (here a SOCKET) is no more available.

I do then, in my main function

signal(SIGPIPE, SIG_IGN);

answered Jan 28 '20 at 15:19

Mickael Sereno

31
5

This works well to get rid of the SIGPIPE. Also, I can call close() if I make sure the handle is valid. I'm on top of a [C++ library](https://docs.juce.com/master/classInterprocessConnection.html) with two layers over the socket. I can use getSocket()->getRawSocketHandle(), check that it is valid, and then call close() on it and that seems to work, along with supressing the SIGPIPE as you indicate. – Chris Jan 28 '20 at 21:20

score 0 · Answer 3 · answered Jan 28 '20 at 15:22

0

Another option is to use MSG_NOSIGNAL flag for send, e.g. send(..., MSG_NOSIGNAL);. In that case SIGPIPE is not sent, the call returns -1 and errno == EPIPE.

answered Jan 28 '20 at 15:22

Maxim Egorushkin

131,725
17
180
271

How do I recover when a synchronous call to socket send() gets blocked due to the loss of the other end of the connection?

3 Answers3