2

I have a larger buffer I'm trying to send as a packet. Nodejs splits the buffer into smaller (65k packets). Once they are received by the client, how can I ensure the packets go together and effectively recombine them into a buffer?

Pretty much using this as a test:

// tcp socket
var buf = Buffer.alloc(265000);
socket.write(buf);

Then on client side I need to combine the 65k packets somehow together back into a buffer.

Thanks

Jeff
  • 353
  • 3
  • 17

2 Answers2

4

TCP is free to break data up on the wire into packets of any size. The size can be different based on different implementations or physical transports. You cannot know exactly how this will happen and should not depend upon exactly how it is implemented. It can even vary depending upon which route your data takes.

Further, the .on('data', ...) event just gives you whatever data has arrived so far. While the order of the packets is guaranteed, there is no guarantee that if you write a certain set of bytes that they will all arrive in the same data event. They can be broken into smaller pieces and may arrive in smaller pieces. This is what happens at the lower level of TCP when you have no real protocol on top of TCP.

So, if you're sending a chunk of data over TCP, you have to invent your own protocol to know when you've got an entire set of data. There are a variety of different schemes for doing this.

  1. Delimiter character. Some sort of delimiter character that won't occur in the actual data and indicates the end of a set of data. You read and parse the data until you get a delimiter character and then you know you have a complete set of data you can process. The HTTP protocol uses a newline character as a delimiter. Sometimes a zero byte is used as a delimiter.

  2. Send length first. For binary data, the length of the data is often sent first and then the recipient knows how many bytes of data they're reading until they have a whole set.

  3. Existing protocols. Something like the webSocket protocol lets you send messages of any size and it will automatically wrap them into packets that contain information about length so that they can be recombined for you automatically into the original set of data without you have to do this yourself. There are thousands of other protocols, one of which may be a perfect match for your needs and you can just use an existing implementation without having to write your own.

One you have some mechanism of knowing when you've received a complete set of data, you then set up your data event handler to read data, collect it into a buffer and watch for the end of the data (using whichever mechanism you have selected). When you see the end of a set, you separate that out from any other data that may have arrived after it and then process it.


So, let's say you were using a zero byte as your delimiter and you've made sure that a zero cannot and does not occur in your real data. Then, you'd set up a data handler like this:

let accumulatedData = Buffer.alloc(0);
socket.on('data', data => {
    // check for delimiter character
    let offset = data.indexOf(0);
    if (offset !== -1) {
        // get the whole message into one Buffer
        let msg = Buffer.concat(accumulatedData, data.slice(0, offset));

        // put rest of data into the accumulatedData buffer as part of next piece of data
        // skip past the delimiter
        accumulatedData = data.slice(offset + 1);

        // emit that we now have a whole msg
        socket.emit('_msg', msg);

    } else {
        // no delimiter yet, just accumulate the data
        accumulatedData = Buffer.concat(accumulatedData, data);
    }
});

// if any accumulated data still here at end of socket
// notify about it
// this is optional as it may be a partial piece of data (no delimiter)
socket.on('end', () => {
   if (accumulatedData.length) {
       socket.emit('_msg', accumulatedData);
   }
});

// this is my own event which is emitted when a whole message is available
// for processing
socket.on('_msg', msg => {
   // code here to process whole msg
});

Note: This implementation removes the delimiter from the end of the msg

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Ok this answer makes a lot of sense and is helping me. If a buffer is broken into 4 packets, can another packet come in during the middle of the transfer? Or is guaranteed to receive the 4 packets first. – Jeff Jul 12 '19 at 18:29
  • @Jeff - As my answer states, TCP guarantees the order of the bytes of data. So, another packet will not arrive in the middle. – jfriend00 Jul 12 '19 at 18:30
  • I don't think I can use 0 as a delimiter or even something like /r or /n. Do you know of character that I could use that's generally never used? Thanks – Jeff Jul 15 '19 at 15:37
  • @Jeff - It totally depends upon your data. If it's true binary data, then no delimiter character is safe unless you encode your data and you should probably use something like options #2 or #3 in my answer. If it's not binary data, then what character could be used as a delimiter totally depends upon the type of data. – jfriend00 Jul 15 '19 at 20:53
  • This worked well and I just used 0xDB as a delimiter because I will never use that character in anything I do. I am successfully getting multiple packets, in order, and combined. – Jeff Jul 23 '19 at 15:10
0

Nodejs is not splitting up the data; TCP/IP is. The maximum amount of data allowed in an IP payload is 64Kb. This is why your packets are being split up (fragmented).

This also means that TCP/IP will piece together the data at the receiving end. This is why you don't have to reassemble REST requests or websites. This is all handled by the lower network layers.

You may want to look at this example. You can edit the createServer() function to send more data like so:

var server = net.createServer(function(socket) {
    let buf = Buffer.alloc(265000);
    for (var i = 0; i < 264900; i++) {
        buf[i] = 'E';
    }
    buf[264900] = '\r'; // newline
    buf[264901] = '\n';
    buf[264902] = 0; // string terminator
    socket.write(buf);
    socket.pipe(socket);
});

The above (along with the other code from the gist) will respond to any request with a string containing 264900 'E's and a newline.

Now, you can use netcat (if on linux) to receive your request:

$ netcat 127.0.0.1 1337
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE ... etc

The buffer can contain whatever and it will all still be transferred. A string is just easy to demonstrate.

In conclusion: Let the network do the work. You will need to read the incoming buffer on the client and save it to its own local buffer but that's pretty much it.

Further reading:

https://nodejs.org/api/net.html#net_socket_write_data_encoding_callback https://techterms.com/definition/packet

spicy.dll
  • 948
  • 8
  • 23
  • client.on("data", (data) => { console.log(data.length); console.log(data); }); Shows 5 different packets come in and I'm not sure how to get those back into one buffer. – Jeff Jul 12 '19 at 16:50
  • They are all automatically combined into the input buffer. Just read the size of the buffer and save it to another buffer. The console logs data as it is received, but it all ends up being added to the end of the input buffer. – spicy.dll Jul 12 '19 at 17:04