0

Using standard linux sockets, is it possible for linux to return LESS than the size of the packet sent?

Assume we have entities A and B. If A sent B 3 messages, seconds apart so that the data could not possibly be combined into the same IP packets, and B performed a recv() on the socket for a size > the 3 messages AFTER the wait period, then all 3 messages would be delivered to the application layer concatenated. I've tested this and have consistently received this result.

Now, assume A sends B 1 message of 200 bytes. If the message is smaller than the MTU of the network, lets say the MTU is 1.5K, is there any scenario in which linux would NOT deliver the 200 bytes to the application in one call to recv()? Is there any possibility that linux could deliver the bytes as 150 followed by 50 on two separate socket read calls? - I understand that it is possible for IP packets to be delivered together, like in the previous scenario, but is it possible for an IP packet to not be delivered all at once to the application layer?

If so, does this prevent the need to frame packets messages if they are sufficiently small, say between 120-200 bytes?

And finally, is there any guarantee that a socket send(), of a relatively small size (200 bytes), sends in one TCP/IP packet?

Thanks for your help!

Code Wiget
  • 1,540
  • 1
  • 23
  • 35
  • 6
    These questions are a sign of a fundamentally wrong approach. TCP isn't packet-oriented. I runs *on top* of a packet-oriented protocol, but the interface it presents is a *stream*. If your application cares about packets at all, it shouldn't be using TCP. –  Jul 02 '18 at 18:42
  • 1
    Agree with Wumpus. Trying to avoid message framing is poor design. Your network protocol design should not rely on the timing of `send()` and `recv()` system calls. That'd be mixing layers of abstraction. If you want a message-based protocol use UDP. If you want reliability, use TCP. If you want both, use TCP and frame your messages. Unfortunately, you can't get both properties unless you venture out into less popular protocols such as [SCTP](https://en.wikipedia.org/wiki/Stream_Control_Transmission_Protocol). – John Kugelman Jul 02 '18 at 18:47
  • 1
    It's pretty easy to come up with scenarios where you can get a partial packet. Bottom line, as others have noted: TCP is a *stream*. Treating it as anything other than a stream is not guaranteed to work. – Andrew Henle Jul 02 '18 at 18:49
  • @WumpusQ.Wumbley TCP works for our application. The question I have is "how to read the TCP stream". Right now we are framing packets using an 2 byte header to denote the size and a one byte code at the end to confirm the end of a packet. So, we read 2 bytes, then recv until the length is read. What we are seeing is that sometimes when we send data, maybe only 130 bytes from application to application, that the receiving application is receiving the data at the application layer in two separate calls to recv(), 1 byte and then 129, 2 bytes and then 128, or some form of that. – Code Wiget Jul 02 '18 at 18:53
  • We were confused as to why the data is coming into the receiving application layer in two reads, rather than one, if the IP layer is not fragmenting the message – Code Wiget Jul 02 '18 at 18:55
  • Two guesses at that are 1) the messages are consistently being fragmented into two IP packets and from there the linux socket is returning each IP packet's data, the tcp data, to the application separately or 2) There is no guaruntee as to how the socket returns data from a TCP stream, and that even if a single IP packet containing a full TCP packet arrives, recv() is not guarunteed to deliver all of the application-destined data from the TCP portion. – Code Wiget Jul 02 '18 at 19:01
  • 3
    @Ryan: It happens, because TCP is a stream, and not a sequence of packets. Your expectation that it should come into receiving application in a single read is simply wrong. There is absolutely nothing in TCP that even tries to retain message boundaries; in fact, there are several mechanisms (like Nagle algorithm) that can cause odd-sized transfers to occur. Most likely reason is that senders TCP stack simply sent the data in odd-sized packets. Investigating the TCP packets on the wire would tell. – Nominal Animal Jul 02 '18 at 19:02
  • @NominalAnimal I hate to verge to a different scenario, but would you say that the same holds true for TLS/SSL that rely on TCP? – Code Wiget Jul 02 '18 at 19:21
  • 1
    SSL will frame the SSL record boundaries for you, but you still cannot assume you will receive all the data in that record in a single read. There are edge cases when the kernel is running low on available buffers. – jxh Jul 02 '18 at 19:23
  • 1
    Yes. Even when making an SSL connection, the sender's TCP stack may initially send very funny-sized packets at odd intervals. This is related to how [flow control](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control) and congestion control is implemented on both ends of the TCP connection, and particularly to [TCP slow start](https://en.wikipedia.org/wiki/TCP_congestion_control#Slow_start). TLS/SSL itself is implemented in userspace, and does not affect slow start at all. – Nominal Animal Jul 02 '18 at 19:29
  • Anyone feel free to answer the question, (if you're interested in the points/checkmark) I will mark it as accepted. Thank you for being so helpful in clearing my understanding! – Code Wiget Jul 02 '18 at 19:38

0 Answers0