0

Using pseudo C# simply because it's what I'm using, and assuming C# simply wraps TCP protocol:

A

Socket s = ...; //a valid open socket that is receiving
string str = "abcdefghijklmnopqrstuvwxyz";
for(int i=0; i<10;++i)
{
 byte[] buf = ASCII.GetBytes(str);
 s.Send(buf)
}

B

Socket s = ...; //a valid open socket that is receiving
string str = "abcdefghijklmnopqrstuvwxyz";
string str2 = "";
for(int i=0; i<10;++i)
{
 str2 += str;
}
byte[] buf = ASCII.GetBytes(str2);
s.Send(buf)

How are these actually different at TCP level, and specifically will the receiver have any way to know the first was 10 separate messages and the second a single message? Does TCP treat each send call as having some packet or end-identifier, or is it simply pushing bytes into a stream and I must define a way to determine what was sent based on the content itself?

Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
  • (if string includes a null-terminator here or something that is not intended to be part of the question, it's meant to be the same exact data split into multiple messages and asking about that aspect) - I'll rewrite if I inadvertently added a bug – Mr. Boy Jan 09 '20 at 18:31
  • 2
    TCP takes a stream of data and segments it. It does not work by message. It will often buffer a bunch of small messages, then send when the buffer is full. – Ron Maupin Jan 09 '20 at 18:35
  • @RonMaupin so for crude example in both cases I might read "abcdefghijklmnopqrstuvwxyzabc" - I am 100% responsible for defining the protocol how to tell where one message ends and the next starts? – Mr. Boy Jan 09 '20 at 18:41
  • 2
    Exactly, that is why you have application-layer protocols, such as HTTP. TCP just takes data and segments it to fit the MSS, then sends when it is ready. UDP is more for messages, but you need an application-layer protocol or application that can deal with lost datagrams. – Ron Maupin Jan 09 '20 at 18:44
  • 1
    The receiver can receive the bytes you sent in any number of calls, each returning any number of bytes. TCP guarantees in-order delivery of all bytes, and that's all. That's why most sane protocols prefix the message with an explicit length, as that's the easiest way to tell the receiver what to expect. Even then, it is possible to receive the bytes that make up the length in any way too -- a common error is for code to assume that receiving the length bytes will happen in one call. On a real network this will be true "almost always", but it can and will fail. – Jeroen Mostert Jan 09 '20 at 18:44
  • @JeroenMostert as in, in my **B** case I might have to call `.Receive()` twice, no guarantee it will buffer them? – Mr. Boy Jan 09 '20 at 18:46
  • Take a look at [Broken TCP messages](https://stackoverflow.com/q/7257139), [How does a TCP packet arrive when using the Socket api in C#](https://stackoverflow.com/q/41948032) and https://blog.stephencleary.com/2009/04/message-framing.html. – dbc Jan 09 '20 at 18:49
  • 2
    Correct. This is because the sending party's network stack can make up packets in any way it likes (it can send one byte per packet if it so pleases) and the receiving party's network stack can likewise choose to collect packets into a buffer before handing it off to the application code, or not at all. This often confounds people because when you test code on a local loopback network, you will almost always "see" data received in bursts of exactly the same size as you sent them (why complicate things, after all), but this is not at all guaranteed on physical networks. – Jeroen Mostert Jan 09 '20 at 18:49

1 Answers1

1

it simply pushing bytes into a stream and I must define a way to determine what was sent based on the content itself?

Yes, it just pushes bytes into the stream, which are then delivered onto the wire. They could go across the wire in any number of packet sizes. You could have a TCP packet with 1 byte of payload transmitted or a TCP packet with near to 9k of bytes of payload (in the case of Jumbo Frames). If the Wire is "virtual" eg a virtual switch between two guests on the same host, then the "packet" could be 60K or bigger depending on the implementation.

It's up to you as the implementer to determine the "Data Framing" this could be as simple as a newline character (in the case of http) or a complicated struct with different bits and bytes meaning different message boundaries.

Rowan Smith
  • 1,815
  • 15
  • 29