4

What is the best way to send out two byte buffers (header + body) in netty

I'm using netty in our project and now we will send out data in following format

  1. header: int(32bit) which contains the length of body
  2. body: byte[]

I'm searching the best way which is the fastest to send out about header+body The point is that I want to avoid the array copy since the body has large data.

1) Create a new byte[] and copy the body into it

void sendData1(ChannelHandlerContext ctx, byte[] body) {
    byte[] newBuf = new byte[4+body.length];

    // header
    int len = body.length;
    newBuf[3] = (byte) (len & 0xff);
    len >>= 8;
    newBuf[2] = (byte) (len & 0xff);
    len >>= 8;
    newBuf[1] = (byte) (len & 0xff);
    len >>= 8;
    newBuf[0] = (byte) len;

    // body
    System.arraycopy(body, 0, newBuf, 4, body.length);

    final ByteBuf outBuf = Unpooled.wrappedBuffer(newBuf);
    ctx.writeAndFlush(outBuf);
}

2) use netty ByteBuf's write function directly

void sendData2(ChannelHandlerContext ctx, byte[] body) {
    final ByteBuf outBuf = ctx.alloc().buffer(4+body.length);

    // header
    outBuf.writeInt(body.length);
    // body
    outBuf.writeBytes(body);

    ctx.writeAndFlush(outBuf);
}

3) Setup two netty ByteBuf and send them out separately and then flush()

void sendData3(ChannelHandlerContext ctx, byte[] body) {
    // header
    final ByteBuf headBuf = ctx.alloc().buffer(4);
    headBuf.writeInt(body.length);

    // body
    final ByteBuf bodyBuf = Unpooled.wrappedBuffer(body);

    ctx.write(headBuf);
    ctx.write(bodyBuf);
    ctx.flush();
}

4) using netty's CompositeByteBuf

void sendData4(ChannelHandlerContext ctx, byte[] body) {
    // header
    final ByteBuf headBuf = ctx.alloc().buffer(4);
    headBuf.writeInt(body.length);

    // body
    final ByteBuf bodyBuf = Unpooled.wrappedBuffer(body);

    CompositeByteBuf composite = ctx.alloc().compositeBuffer();
    composite.addComponents(headBuf, bodyBuf);
    ctx.writeAndFlush(composite);
}

Option 1) and option 2) will do Array copy, I think they will have the same performance? Option 3) I'm not sure whether it will do array copy or not, but it will invoke ctx.write() twice and I think this is expensive; Option 4) I'm not sure whether it will work or not. But I have tried it in netty5, seems it only send the head out.

Which one are you using or do you have any good options?

Thanks a lot!

Lan
  • 176
  • 2
  • 8

1 Answers1

3

First a standard disclaimer...

Performance typically boils down to your use case, software, and hardware configurations. For Netty a big component of the software configuration is how big is your pipeline and what in your pipeline will act on your write. There are also at least 2 major components to performance: run time, memory (and frequently trade-offs can be made to optimize for one or the other). In order to get the optimal solution for your use case I would suggest running the different scenarios and see what gives you the best performance.

Next some practical observations:

  1. Seems to be a more complicated version of 2.
  2. As you indicated you are copying your data into Netty's buffers.
  3. ctx.write() does not necessarily have to copy. It depends on your pipeline and what allocators are in use. No matter which approach you take you are going to have to write the data to the channel. The only difference is whether you copy the data in the application layer or potentially let the channel write it out directly.
  4. CompositeByteBuf - haven't used this much but there will be some overhead of dealing with a collection of buffers. if you can manually allocate buffers and make the calls to write directly then I wouldn't be able to provide justification to use this. by managing the buffers yourself you are potentially doing loop unrolling for every one of CompositeByteBuf's methods.

Have you looked into PooledByteBufAllocator and directBuffer? If your use-case permits these may provide some performance benefits. Pooled allocators means you only have to pay for Java's auto zeroing of buffers once and also may lead to less GC activity. See this question for a general description of direct buffers ByteBuffer.allocate() vs. ByteBuffer.allocateDirect().

Community
  • 1
  • 1
Scott Mitchell
  • 716
  • 4
  • 7