A simple rule of when I should use direct buffers with Java NIO for network I/O?

Question

Can someone with the natural gift to explain complex things in an easy and straightforward way address this question? To acquire the best performance when should I use direct ByteBuffers versus regular ByteBuffers when doing network I/O with Java NIO?

For example: Should I read into a heap buffer and parse it from there, doing many get() (byte by byte) OR should I read it into a direct buffer and parse from the direct buffer?

Direct buffers try to allocate the memory contiguously rather than allocating Java arrays locally. You want to do this because it reduces the amount of work to be done during I/O since a native buffer is ready as-is to be passed to the kernel, while using non-native buffers requires an additional pass. — obataku, Aug 29 '12 at 23:51
@veer That's the "by-the-book" explanation. I am looking more for a rule. Eventually you have to read the direct buffer into Java space, so the kernel boundary will have to be crossed sooner or later. — JohnPristine, Aug 29 '12 at 23:53
As others have said, when reading (large) files into buffers, direct buffers should be better, as heap buffers do not have an absolute address so the may need to be cached somewhere else first. Using direct buffers may additionally have the benefit of allowing a completely native IO operation (potentially even asynchronous) without the vm having to do anything. — RecursiveExceptionException, Aug 19 '16 at 14:14

score 13 · Accepted Answer · edited May 16 '23 at 07:45

13

To acquire the best performance when should I use direct ByteBuffers versus regular ByteBuffers when doing network I/O with Java NIO?

Direct buffers have a number of advantages

The avoid an extra copy of data passed between Java and native memory.
If they are re-used, only the page used are turning into real memory. This means you can make them much larger than they need to be and they only waste virtual memory.
You can access multi-byte primitives in native byte order efficiently. (Basically one machine code instruction)

Should I read into a heap buffer and parse it from there, doing many get() (byte by byte) OR should I read it into a direct buffer and parse from the direct buffer?

If you are reading a byte at a time, you may not get much advantage. However, with a direct byte buffer you can read 2 or 4 bytes at a time and effectively parse multiple bytes at once.

[real time] [selectors]

If you are parsing real time data, I would avoid using selectors. I have found using blocking NIO or busy waiting NIO can give you the lowest latency performance (assuming you have a relatively small number of connections e.g. up to 20)

EDIT: Here is a high performance library that is relatively easy to use that uses blocking NIO you can draw on https://github.com/OpenHFT/Chronicle-Wire/tree/ea/src/main/java/net/openhft/chronicle/wire/channel

edited May 16 '23 at 07:45

Johannes Kuhn

14,778
4
49
73

answered Aug 30 '12 at 07:39

Peter Lawrey

525,659
79
751
1,130

What do you mean by "busy waiting NIO"? You have to at least perform a selectNow inside an infinite (busy spinning) loop, right? I even tried to read from the channels without doing a selet but eventually data stops coming. – chrisapotek Aug 30 '12 at 15:46
4

You can do `while((len = socketChannel.read(byteBuffer)) < 1);` or poll an array of socketChannels. You don't need a Selector. ;) – Peter Lawrey Aug 30 '12 at 16:06
I am trying to do exactly this, but interestingly enough my channels STOP receiving data after some time. I have to select again. :( Ohhhh, I am checking isReadable()... let me try again... – chrisapotek Aug 30 '12 at 16:16
Holy cow !!! 2 microseconds gain !!!! You are the best, Peter! One day I will work for you. :) – chrisapotek Aug 30 '12 at 16:35
1

The gain is more than that because you are not giving up the CPU. That means the cache for your code, branch prediction and data is still warm. i.e. it will also run up to 2-3x faster. – Peter Lawrey Aug 30 '12 at 16:37
Have you tested that with UDP? It works with SocketChannels but not with DatagramChannels. channel.receive just returns null somehow. :( – chrisapotek Aug 30 '12 at 16:47
The Javadoc says `Returns: The datagram's source address, or null if this channel is in non-blocking mode and no datagram was immediately available` So I imagine that is the reason. ;) – Peter Lawrey Aug 30 '12 at 16:56
how do you use blocking NIO when you are polling an array of channels (datagram/socket etc)? or are you suggesting to use one thread per channel alongside blocking NIO? – experiment unit 1998X May 14 '23 at 13:21
If you are polling an array of channels, wont there be some time wasted polling channels that are not ready? Would this time wasted eventually cause packets to be lost if there is high throughput (ie financial data streaming) – experiment unit 1998X May 14 '23 at 16:13
1

@experimentunit1998X If you are using many blocking NIO connections it is probably better to use a Selector however that limit is around 10 to 100. Selector is the most scalable solution, however has a few issues such that in my case I rarely use it as I generally have very first connections. Here is a library I wrote using blocking NIO with a thread per connection, note: I assume less than 100 connections. https://github.com/OpenHFT/Chronicle-Wire/tree/ea/src/main/java/net/openhft/chronicle/wire/channel – Peter Lawrey May 16 '23 at 05:39
Let me summarize you suggested: 1) _selector that multiplexes connections_ (_10-100 connections per selector_) 2) _thread per connection with blocking NIO_ 3) _polling an array of channels_ (i assume with non blocking NIO). For small amount of connections up to 100(?), I assume that option 2 might be better, but when trying to scale option 1 would be the option to go for? and if cpu usage is not a problem, busy spin on option 3? – experiment unit 1998X May 16 '23 at 07:17
1

@experimentunit1998X I would say the other way around. Use selectors for large numbers of connections and a thread pool for a more scalable solution e.g. 100+; however, if you want to keep things simple, use blocking NIO and a thread per connection. – Peter Lawrey May 16 '23 at 07:44
1

alright. I think i understand now. Very much appreciated, Peter :) – experiment unit 1998X May 16 '23 at 07:52

score 4 · Answer 2 · answered Aug 30 '12 at 00:00

4

A direct buffer is best when you are just copying the data, say from a socket to a file or vice versa, as the data doesn't have to traverse the JNI/Java boundary, it just stays in JNI land. If you are planning to look at the data yourself there's no point in a direct buffer.

answered Aug 30 '12 at 00:00

user207421

305,947
44
307
483

I heard something like if you copy byte-by-byte one is better than the other. Eventually you have to look at the data, yes, that was my point and doubt. What do you think? – JohnPristine Aug 30 '12 at 00:23
@JohnPristine Copying bytes one by one from a direct buffer would certainly be painful but then copying anything to or from a direct buffer is painful, that's why I said to avoid it unless you are just copying between channels, where you can using the same buffer and never have to get the data out at all. – user207421 Aug 30 '12 at 05:08
2

Whether JNI is used or not is implementation specific. AFAIK Android uses native JNI calls and the direct ByteBuffer is slower. HotSpot and OpenJDK treat it as an intrinsic so no JNI is involved making direct ByteBuffers faster. – Peter Lawrey Aug 30 '12 at 07:41
@PeterLawrey I keep reading this from you but I don't see how it is possible to meet the contract of 'direct' buffers without them existing in the JNI space rather than the Java space. – user207421 Aug 30 '12 at 09:55

A simple rule of when I should use direct buffers with Java NIO for network I/O?

2 Answers2

Linked