TCP socket recv indicating "unexpected" disconnect after successful send

Question

I have a TCP socket in blocking mode being used for the client side of a request/response protocol. Sometimes I am finding that if a socket was unused for a minute or two a send call succeeds and indicates all bytes sent, but the following recv returns zero, indicating a shutdown. I have seen this on both Windows and Linux clients.

The server guys tell me they always send some response before shutdown if they had received data, but they may close a socket that has not yet received anything if low on server resources.

Is what I am seeing indicative of the server having closed the connection while I was not using it, and then why does send then succeed?

What is the correct way automatically detect this such that the request is resent on a new connection in this case, but bearing in mind that if the server actually received some requests twice could have unintended effects?

//not full code (buffer management, wrapper functions, etc...)
//no special flags/options are being set, just socket then connect
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
connect(sock, addr, addrlen);
//some time later after many requests/responses, normally if was inactive for a minute
//sending about 50 bytes for requests, never actually seen it loop, or return 0
while (more_to_send) check(send(sock, buffer, len, 0));
//the very first recv returns 0, never seen it happen part way through a response (few KB to a couple of MB)
while (response_not_complete) check(recv(sock, buffer, 4096, 0));

`send` takes a fourth argument, `flags`, which you are not specifying. If so, the flags will be random, and they can affect transmission. Likewise, for `recv` — Craig Estey, Aug 03 '16 at 21:32
My bad it is zero in both cases, fortunately compiler errors are on, unlike on SO :) — Fire Lancer, Aug 03 '16 at 21:35
I figured as much, which is why I just did only a comment. What does `check` do (e.g. abort on any error)? Can your send ever return `EMSGSIZE`? — Craig Estey, Aug 03 '16 at 21:39
Your code should take in consideration that a successful "send()" does not indicate it was received by the client. Microsoft's man page on send() says in part: "The successful completion of a send function does not indicate that the data was successfully delivered and received to the recipient. This function only indicates the data was successfully sent." Ubuntu's man page on send() says in part: "No indication of failure to deliver is implicit in a send(). Locally detected errors are indicated by a return value of -1." — TonyB, Aug 03 '16 at 21:46
Check just is < 0 (error), == 0 (disconnect), > 0 (got data, see if need to loop more or complete) cases, of which I am seeing == 0 on the first recv here. Not seeing any -1 returns (by EMSGSIZE, assume you mean send(...) == -1 and errno == EMSGSIZE?) — Fire Lancer, Aug 03 '16 at 22:04
Installed wireshark, but not fully familiar with TCP protocol details. If I pause a while after a successful response, I see a "[FIN, ACK]" from the remote server to my system, followed by an "[ACK]" back to the remote. If I then send a request, I see the request then an "[ACK]" from the server back to me. But no response data, and `recv` returns 0. — Fire Lancer, Aug 03 '16 at 22:06
@TonyB I kinda figured as much, but then what is the correct procedure? Sending some kind of ping request/response first to "test" the socket seems wasteful (and what would be the threshold for needing a new "ping"? 1 second? 10 seconds?), and I don't see such a concept in other common protocols (e.g. HTTP when wanting to send a POST or similar with keep-alive connections) — Fire Lancer, Aug 03 '16 at 22:20
I studied the low level aspects of `TCP` out of academic interest, but it was a while ago, so the following is a bit of a guess. I believe `FIN, ACK` is a termination request by the server. The client sending the `ACK` back is that it is acknowledging the `FIN` (ie. both sides are now "disconnected"). See: http://stackoverflow.com/questions/15182106/what-is-the-reason-and-how-to-avoid-the-fin-ack-rst-and-rst-ack — Craig Estey, Aug 03 '16 at 22:31
So if the `send` happens after, why is it even allowed, and the server even sends an ACK saying it got that request? Is there something else I was meant to do (e.g. some special call to put before `send` if not used for x seconds that goes "I got a finish packet, you should close now"?). Non-blocking `recv` with the expectation that it says nothing to read / would-block (since the server never sends data of its own accord)? — Fire Lancer, Aug 03 '16 at 22:47
If you want to check to see if a socket has been "closed" by the peer before you call "send()"... you could call select() which has both members of the timeval structure equal to zero, so that it will return immediately. Note "select()" will note whether a "recv()" on the socket would not block... which means EITHER data is ready to be read, OR the socket has been closed by the peer. — TonyB, Aug 03 '16 at 23:35
I just tested that, worked on Windows, but always get (tried about 100 times to be sure) `select(1, &rd_set, null, null, zero_tv) == 0; send(data...) == expected_len; recv(...) == 0` on my Linux systems, even though those 3 socket calls are one after each other. Guessing there is something else i was meant to have done first (some socket opt?) — Fire Lancer, Aug 04 '16 at 22:37

score 1 · Answer 1 · answered Aug 04 '16 at 01:13

1

If you don't get an application acknowledgment of the request from the server, re-send it.
Design your transactions to be idempotent so that re-sending them doesn't cause ill-effects.

answered Aug 04 '16 at 01:13

user207421

305,947
44
307
483

Asking for a protocol change would take time, and I do not see such a detail in HTTP and almost never see a webpage do its own thing to protect against such? For example letting the HTTP POST request for stackoverflow comments get sent twice, does result in a duplicate comment, and HTTP (and stackoverflow) do use keep-alive sockets for multiple requests, and there is generally some delay between previous requests and such a POST. Yet I never see web browsers accidentally send a POST twice, or frequently fail to entirely without an actual network/server issue. – Fire Lancer Aug 04 '16 at 22:24
In-fact this seems to be the case in most TCP based protocols I have come across? (I guess some like SMTP avoid it by the nature that an operation such as sending a mail is several steps in quick succession, but also never recall being told the reason for that design was to avoid this specific problem) – Fire Lancer Aug 04 '16 at 22:28

Jeremy Friesner · Answer 2 · 2016-08-04T05:22:13.557

Is what I am seeing indicative of the server having closed the connection while I was not using it

Yes.

, and then why does send then succeed?

send()'s succeeding tells you only that some (or all) of the data you passed into send() has been successfully copied into an in-kernel buffer, and that from now on it is the OS's responsibility to try to deliver those bytes to the remote peer.

In particular, it does not indicate that those bytes have actually gone across the network (yet) or been successfully received by the server.

What is the correct way automatically detect this such that the request is resent on a new connection in this case, but bearing in mind that if the server actually received some requests twice could have unintended effects?

As EJP suggests, the best way would be to design your communications protocol such that sending the same request twice has no effect that is different from sending it once. One way to do that would be to add a unique ID to each message you send, and add some logic to the server such that if it receives a message with an ID that is the same as one that it has already processed, it discards the message as a duplicate.

Having the server send back an explicit response to each message (so that you can know for sure your message got through and was processed) might help, but of course then you have to start worrying about the case where your message was received and processed but then the TCP connection broke before the response could be delivered back to you, and so on.

One other thing you could do (if you're not doing it already) is to monitor the state of the TCP socket (via select(), poll(), or similar) so that your program will be immediately notified (by the socket select()-ing as ready-for-read) when the remote peer closes its end of the socket. That way you can deal with the closed TCP connection well before you try to send() a command, rather than only finding out about it afterwards, and that should be a less awkward situation to handle, since in that case there is no question about whether a command "got through" or not.

Using `select(1, &sock_set, ..., zero_tv)` worked on Windows, but doesnt on Linux, it always returns 0, allows the `send` to succeed then fails the real `recv` after. — Fire Lancer, Aug 04 '16 at 22:17
Changing the protocol would take time. As far as I can recall, many other protocols do not have such a concept? — Fire Lancer, Aug 04 '16 at 22:19
Oh just to be clear, the server does always send a response for a request (even if its just like a "ok" or "invalid request"). In the case of `send` suceeding and sending the packet over the wire, then `recv` returning zero, the server logs indicate the server never got that request, at least at the application level. — Fire Lancer, Aug 04 '16 at 23:07
You're passing in the wrong value to the first argument of select(). You shouldn't pass in 1, rather you should pass in the value of the largest socket descriptor select() is supposed to be looking at, plus 1. (What you're doing works on Windows only because the Windows implementation of select() simply ignores the first argument) — Jeremy Friesner, Aug 05 '16 at 01:57

TCP socket recv indicating "unexpected" disconnect after successful send

2 Answers2