1

I have two processes on a Centos 6 (Linux 2.6.32) system that are talking to each other over an AF_INET/SOCK_STREAM socket. When I stress test the link by blasting enough small packets to fill the socket and then exit the sending process, the receiving process loses the final 3/4 or so of the packets.

As soon as the sending process exits, the receiver's poll() starts returning revents of POLLIN | POLLRDHUP | POLLERR | POLLHUP. At some point, it fails to read the full packet it expects (read() returns a smaller number than the passed-in length), and the following read() returns -1 with errno set to ECONNRESET). It certainly looks to me like it has read all the data in the pipe and there is no more to come.

If I do not exit the sending process after filling the pipe (just go into an endless loop until I kill it by hand), then the receiver gets all the data.

My guess is that means that the sender's write()s are ending up being buffered somewhere, with that buffer getting tossed if it exits, instead of returning a failure. Disabling Nagle (turning on TCP_NODELAY) doesn't change this behavior.

The code that does the write is:

iov[0].iov_base = &len;
iov[0].iov_len = sizeof(uint32_t);
iov[1].iov_base = buf;
iov[1].iov_len = len;
if ((wlen = writev(fd, iov, NELEM(iov))) != (iov[0].iov_len + iov[1].iov_len)) {
    ...  // error handling

(it sends a 32-bit length followed by the data).

Can anyone lend me a clue about what is going on, and how I can reliably know whether my write()s have succeeded?

One Guy Hacking
  • 1,176
  • 11
  • 16
  • If you replace the call to writev() with two separate calls to write(), does the problem go away? (or if not, does the nature of the problem become more obvious once you have access to the individual return values from each of the two write() calls?) – Jeremy Friesner Dec 20 '16 at 03:42
  • No, the problem remains. (The writev() is necessary, by the way, because without it the first 4-byte packet will be transmitted and then the second packet will be held until the ACK is received (unless NODELAY is set)). – One Guy Hacking Dec 20 '16 at 07:38
  • @Jabberwock *The writev() is necessary, by the way, because without it the first 4-byte packet will be transmitted and then the second packet will be held until the ACK is received (unless NODELAY is set)* If the `writev()` is "necessary", the receiver isn't properly treating the received data as a stream subject to being broken up at **any** point and is thus subject to all kinds of bugs. – Andrew Henle Dec 20 '16 at 12:46
  • It's a bit off-topic, but you can avoid the writev() in a couple of ways if necessary: Either provide an extra 4 bytes of room at the start of the payload-buffer so that you can write the header in there and send the whole thing with a single write(), or you can keep Nagle's algorithm enabled on the socket and "flush" the socket whenever you want buffered data to be sent immediately, by disabling Nagle's, sending a 0-byte buffer, then re-enabling it again. – Jeremy Friesner Dec 20 '16 at 15:53
  • @AndrewHenle Perhaps he meant "necessary to achieve best performance" rather than "necessary to obtain correct parsing behavior" ? I hope so, anyway :) – Jeremy Friesner Dec 20 '16 at 15:53
  • @JeremyFriesner `writev()` isn't anything special - it's a [simple wrapper around `write()`](https://fossies.org/dox/glibc-2.24/sysdeps_2posix_2writev_8c_source.html) that does nothing more than copy the "vector" to a single buffer, then `write()` that buffer. Imagine trying to handle a partial `writev()`... – Andrew Henle Dec 20 '16 at 16:20
  • @AndrewHenle Hopefully it can be implemented more optimally than that, or it would hardly be worth using. In particular, writer's internal-destination buffer ought to be the socket's outgoing-data FIFO, and not some temporary buffer. I agree that correctly handling partial writev()'s correctly is difficult, which is one reason I avoid using it myself (the other is its limited portability). – Jeremy Friesner Dec 20 '16 at 16:37

1 Answers1

0

Before closing the writter process, you should close cleanly the socket by a shutdown call:

shutdown(fd, SHUT_WR);

This will act like a bit like if you were flushing the socket.

You can also close the socket, see this question: close vs shutdown socket? for detailed information.

Community
  • 1
  • 1
Mathieu
  • 8,840
  • 7
  • 32
  • 45
  • Calling shutdown() has no significant effect (poll() starts returning POLLIN | POLLRDHUP, but no other changes are seen). It queues a FIN for the server (which gets correctly received, hence the POLLRDHUP), but if the socket is close()d after the shutdown() and before all the data is transmitted I see the same loss of data that write() reported it sent. – One Guy Hacking Dec 20 '16 at 21:52