24

Two cases are well-documented in the man pages for non-blocking sockets:

  • If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a state of returning EAGAIN/EWOULDBLOCK the next call with >0 bytes to transfer.
  • If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is ready for more data (EPOLLOUT in the epoll case).

What's not documented for nonblocking sockets is:

  • If send() returns a positive value smaller than the buffer size.

Is it safe to assume that the send() would return EAGAIN/EWOULDBLOCK on even one more byte of data? Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK? I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually in a "would block" state to respond to it coming out of.

Obviously, the latter strategy (trying again to get something conclusive) has well-defined behavior, but it's more verbose and puts a hit on performance.

  • @Damon your edit completely changed the meaning of the question. – user207421 Oct 16 '13 at 09:00
  • 1
    @EJP: The OP is obviously aware of `EWOULDBLOCK` (or how non-blocking sockets generally work, for the most part), so it's in my opinion a safe bet that the wording "would block" which seemed to confuse you is merely a bad wording, but not what's intended. – Damon Oct 16 '13 at 09:08
  • @Damon That isn't obvious to me at all. Clearly that is exactly what has confused the OP. Not me. That was the whole entire and complete point. By removing that from the question, you removed its entire meaning. Don't do that. If you want to *answer* the question, by all means do so. But don't just change it to suit yourself. – user207421 Oct 16 '13 at 09:17
  • Damon is correct. I've updated the question to be more precise. I'm aware that nonblocking sockets never actually block, just return that they would. – David Timothy Strauss Oct 16 '13 at 22:26
  • 1
    You cannot assume anything. The nic card driver is probably asynchronous, your computer is probably asynchronous, etc... The send buffer could be drained by another core while your send was in process etc... – JimR Oct 16 '13 at 23:04

2 Answers2

43

A call to send has three possible outcomes:

  1. There is at least one byte available in the send buffer →send succeeds and returns the number of bytes accepted (possibly fewer than you asked for).
  2. The send buffer is completely full at the time you call send.
    →if the socket is blocking, send blocks
    →if the socket is non-blocking, send fails with EWOULDBLOCK/EAGAIN
  3. An error occurred (e.g. user pulled network cable, connection reset by peer) →send fails with another error

If the number of bytes accepted by send is smaller than the amount you asked for, then this consequently means that the send buffer is now completely full. However, this is purely circumstantial and non-authorative in respect of any future calls to send.
The information returned by send is merely a "snapshot" of the current state at the time you called send. By the time send has returned or by the time you call send again, this information may already be outdated. The network card might put a datagram on the wire while your program is inside send, or a nanosecond later, or at any other time -- there is no way of knowing. You'll know when the next call succeeds (or when it doesn't).

In other words, this does not imply that the next call to send will return EWOULDBLOCK/EAGAIN (or would block if the socket wasn't non-blocking). Trying until what you called "getting a conclusive EWOULDBLOCK" is the correct thing to do.

Uli Köhler
  • 13,012
  • 16
  • 70
  • 120
Damon
  • 67,688
  • 20
  • 135
  • 185
  • 3
    I assume this is only correct for nonblocking: "send succeeds and returns the number of bytes accepted (possibly fewer than you asked for)." A blocking socket should block until it finishes sending all the data or fails for other reasons. I'd either add the blocking scenario to #1 or drop the blocking scenario from #2 (making the answer solely about nonblocking sockets). – David Timothy Strauss Oct 17 '13 at 10:23
  • 2
    @DavidTimothyStrauss: Surprisingly, your assumption is in perfect accordance with both the wording of the manpage and POSIX (the wording in the analogous `write` syscall explicitly mentions partial writes). And while this wording makes sense for datagram sockets (which you _can't_ use with `send`!), it makes no sense for connection-oriented sockets. Sending e.g. 100kB in one go is certainly allowable, but given a "typical" buffer size of 64kB, if the whole message has to fit in, that would mean your program would block _forever_ (because it will never fit the complete message into the buffer). – Damon Oct 17 '13 at 10:58
  • 2
    For datagram sockets, of course, it makes sense, since only the complete message can be sent, that the complete message must fit in the buffer. But datagram sockets are required to have a buffer that can hold at least one maximum-sized datagram. Which means `sendto` will not block _forever_. For stream sockets, partial sends have been a reality for 3 decades (irrespective of the confusing wording in the manpages), see for example [Beej's guide](http://www.beej.us/guide/bgnet/output/html/multipage/advanced.html#sendall) on the subject (Beej's networking guide is generally a very good read). – Damon Oct 17 '13 at 11:13
  • 4
    There was a discussion on this point on news:comp.protocols.tcp-IP a few years ago, where it was universally agreed that blocking send()s block until all the data has been buffered. That newsgroup is populated by all the implementors of TCP. They should know. – user207421 Oct 17 '13 at 21:56
  • @EJP: That's a scary thought, but what's even more scary is the fact that the true, observable behaviour e.g. under Linux is yet something completely different. On my Debian box (3.2 kernel), the send buffer defaults to 84kiB, and `send` will happily accept buffers of 64MiB in one `send` and return "64MiB sent". Surely 64MiB > 84kiB? Source code: http://pastebin.com/T6N7hQky -- Program starts a server that consumes all traffic at port 12345, and forks a client that writes increasing chunks (64kiB to 64MiB, increasing in steps of 16kiB) to that port, counting no. of sends. – Damon Oct 18 '13 at 11:36
  • 2
    @Damon Blocking send() guarantees the data has been *at least* buffered. It could go through many cycles of buffer-send-buffer and return once the whole 64MiB are fully sent or buffered. – David Timothy Strauss Oct 18 '13 at 23:52
  • @DavidTimothyStrauss: Yes, certainly. But if one takes the wording of POSIX (or what EJP said) literally, then either way, it should block if the complete 64MiB message doesn't fit into the send buffer, which is _always and forever_ the case for such a huge send. I haven't tested out how much `send` will actually accept without blocking, but a chunk of 65MiB is already stunning. There is no way you could do a 64MiB `WriteFile` under Windows, for example (not under normal conditions anyway, that would run against your process' quota of lockable pages). – Damon Oct 19 '13 at 11:01
  • And now please fasten your seatbelts... I just modified the above code to attempt blocking sends from 512MiB to 1GiB, for laughs. Guess what, `send` accepts a send of 1GiB in one chunk, too. No partial sends, and from the fact that the process takes 97.4% CPU while running, it seems that it doesn't block either. – Damon Oct 19 '13 at 11:13
  • @Damon: of course it blocks. The receiver is just consuming the data fast (or you could have a huge send window), immediately waking up the sender. In particular, in Linux/TCP, the send loop in `tcp_sendmsg()` calls `sk_stream_wait_memory()` → `sk_wait_event()` → `schedule_timeout()` when there is no memory/send buffer space available. – ninjalj Mar 10 '14 at 11:24
  • @Damon We are in agreement. All the data is buffered before send() returns. If the data to be sent is larger than the send buffer, obviously the send buffer has to be *sent* to make room for more of the data, and so on continuously until all the data had been accounted for. Then send() returns. – user207421 Sep 13 '14 at 11:36
  • @Damon BTW `send()` can indeed be used with *connected* UDP sockets. – user207421 May 18 '16 at 23:40
7

If send() returns the same length as the transfer buffer, the entire transfer finished successfully, and the socket may or may not be in a blocking state.

No. The socket remains in the mode it was in: in this case, non-blocking mode, assumed below throughout.

If send() returns -1 and errno is EAGAIN/EWOULDBLOCK, none of the transfer finished, and the program needs to wait until the socket is isn't blocking anymore.

Until the send buffer isn't full any more. The socket remains in non-blocking mode.

If send() returns a positive value smaller than the buffer size.

There was only that much room in the socket send buffer.

Is it safe to assume that the send() would block on even one more byte of data?

It isn't 'safe' to 'assume [it] would block' at all. It won't. It's in non-blocking mode. EWOULDBLOCK means it would have blocked in blocking mode.

Or should a non-blocking program try to send() one more time to get a conclusive EAGAIN/EWOULDBLOCK?

That's up to you. The API works whichever you decide.

I'm worried about putting an EPOLLOUT watcher on the socket if it's not actually blocking on that.

It isn't 'blocking on that'. It isn't blocking on anything. It's in non-blocking mode. The send buffer got filled at that instant. It might be completely empty a moment later.

I don't see what you're worried about. If you have pending data and the last write didn't send it all, select for writability, and write when you get it. If such a write sends everything, don't select for writability next time.

Sockets are usually writable, unless their send buffer is full, so don't select for writability all the time, as you just get a spin loop.

user207421
  • 305,947
  • 44
  • 307
  • 483