5

Does a successful call to send() with the number returned equal to the amount specified in the size parameter guarantee that no "partial sends" will occur?

Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?

I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again.

Update: Based on the answers so far, my question could be rephrased as follows:

Is there any way for packets/data to be sent over the wire before the call to send() returns?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
lxgr
  • 3,719
  • 7
  • 31
  • 46

2 Answers2

5

Does a successful call to send() with the number returned equal to the amount specified in >the size parameter guarantee that no "partial sends" will occur?

No, it's possible that parts of your data gets passed over the wire, and another part only goes as far as being copied into the internal buffers of the local TCP stack. send() will return the no. of bytes passed to the local TCP stack, not the no. of bytes that gets passed onto the wire (and even if the data reaches the wire, it might not reach the peer).

Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?

As send() only returns the no. of bytes passed into the local TCP stack, not whether send() actually sends anything, you can't really distinguish these two cases anyway. But yes, it's possibly only some data makes it over the wire. Even if there's enough space in the local buffer, the peer might not have enough space. If you send 2 bytes, but the peer only has room for 1 more byte, 1 byte might be sent, the other will reside in the local tcp stack until the peer has enough room again.

(That's an extreme example, most TCP stacks protects against sending such small segments of data at a time, but the same applies if you try to send 4k of data but the peer only have room for 3k).

I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again

That will only happen if your socket is non-blocking. If it's blocking and the local buffers are full, send() will wait until there's room in the local buffers again (or, it might return a short count if parts of the data was delivered, but an error occured in the mean time.)

Edit to answer:

Is there any way for packets/data to be sent over the wire before the call to send() returns?

Yes. That might happen for many reasons. e.g.

  • The local buffers gets filled up by that recent send() call, and you use blocking I/O.
  • The TCP stack sends your data over the wire but decides to schedule other processes to run before that sending process returns from send().
Yuankun
  • 6,875
  • 3
  • 32
  • 34
nos
  • 223,662
  • 58
  • 417
  • 506
  • Ok, almost completely: In the non-blocking case, will the OS always try to copy as much into the buffers as possible, or could a preemption during the exact moment of the copy from userspace to the TCP send buffer cause an unnecessary small packet to be sent? Do I even have to worry about that, or should I trust the TCP implementation to handle such corner cases for me? :) – lxgr Nov 08 '11 at 17:38
  • 1
    @lxgr I'm not sure why you worry so much about this. You check the return value of send() . That's how much data was copied to the TCP stack. If it's less than what you want to send, you do a loop sending the rest(or return to a select/poll loop and write the rest when it's indicated that you can write again). If you are trying to ensure/know that your application data has reached the peer, you need to implement your own mechanism on top of TCP. Preemption can occur pretty much any time, but preemption doesn't cause a non-blocking send to abort mid-flight. – nos Nov 08 '11 at 18:09
  • Yeah, I think that by now I know enough to use send() for my use case. I was just getting interested in how the kernel handles it internally. Thanks for your replies, they are very much appreciated! – lxgr Nov 08 '11 at 18:17
  • Am I wrong that the number returned from a TCP send() is the number of bytes ACK'ed by remote, and thus are guaranteed sent? – Peter Nov 09 '11 at 09:08
  • @Peter Yes, that's very wrong, there's no guarantee that anything is sent or ack'ed. send() only copies the data into the local buffers of the TCP stack. The TCP stack takes care of transferring and ack'ing it at its leisure. (send() might indicate success without anything ever going over the wire if an error occurs after send() successfully returned. A subsequent send() call would get an error though in such a case) – nos Nov 09 '11 at 12:14
  • @Nos In that case, then wouldn't we, as WinSock users that utilizing TCP, lose one of the most valuable feature (The Reliableness) that TCP provides? And a WinSock user has to take care of TCP flow control and re-transmission complexities? So I feel on a blocking-send case, the # returned is the # ACK'ed. What do you say? – Peter Dec 01 '11 at 08:25
  • @Peter TCP does flow control, and provides reliability as far as possible. There's no connection between a blocking send() and the # acked. – nos Dec 01 '11 at 19:26
4

Though this depends on the protocol you are using, the general question is no.

For TCP the data gets buffered inside the kernel and then sent out at the discretion of the TCP packetization algorithm, which is pretty hairy - it keeps multiple timers, minds path MTU trying to avoid IP fragmentation.

For UDP you can only assume this kind of "atomicity" if your datagram does not exceed link frame size (usual value is 1472 = 1500 of ethernet frame - 20 bytes of IP header - 8 bytes of UDP header). Otherwise your sending host will have to IP-fragment the datagram.

Then intermediate routers can still IP-fragment the passing packet if their outgoing link MTU is less then the packet size.

Nikolai Fetissov
  • 82,306
  • 11
  • 110
  • 171