165

To allocate() or to allocateDirect(), that is the question.

For some years now I've just stuck to the thought that since DirectByteBuffers are a direct memory mapping at OS level, that it would perform quicker with get/put calls than HeapByteBuffers. I never was really interested in finding out the exact details regarding the situation until now. I want to know which of the two types of ByteBuffers are faster and on what conditions.

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
  • To give a specific answer, you need to say specificly what you are doing with them. If one was always faster than the other, why would there be two variants. Perhaps you can expand on why you are now "really interested in finding out the exact details" BTW: Have you read the code, esp for DirectByteBuffer? – Peter Lawrey Apr 15 '11 at 07:20
  • They will be used to read from and write to `SocketChannel`s that are configured for non-blocking. So regarding what @bmargulies said, `DirectByteBuffer`s will perform faster for the channels. –  Apr 16 '11 at 01:49
  • @Gnarly At least the current version of my answer says that channels are expected to benefit. – bmargulies Oct 13 '11 at 12:59

4 Answers4

162

Ron Hitches in his excellent book Java NIO seems to offer what I thought could be a good answer to your question:

Operating systems perform I/O operations on memory areas. These memory areas, as far as the operating system is concerned, are contiguous sequences of bytes. It's no surprise then that only byte buffers are eligible to participate in I/O operations. Also recall that the operating system will directly access the address space of the process, in this case the JVM process, to transfer the data. This means that memory areas that are targets of I/O perations must be contiguous sequences of bytes. In the JVM, an array of bytes may not be stored contiguously in memory, or the Garbage Collector could move it at any time. Arrays are objects in Java, and the way data is stored inside that object could vary from one JVM implementation to another.

For this reason, the notion of a direct buffer was introduced. Direct buffers are intended for interaction with channels and native I/O routines. They make a best effort to store the byte elements in a memory area that a channel can use for direct, or raw, access by using native code to tell the operating system to drain or fill the memory area directly.

Direct byte buffers are usually the best choice for I/O operations. By design, they support the most efficient I/O mechanism available to the JVM. Nondirect byte buffers can be passed to channels, but doing so may incur a performance penalty. It's usually not possible for a nondirect buffer to be the target of a native I/O operation. If you pass a nondirect ByteBuffer object to a channel for write, the channel may implicitly do the following on each call:

  1. Create a temporary direct ByteBuffer object.
  2. Copy the content of the nondirect buffer to the temporary buffer.
  3. Perform the low-level I/O operation using the temporary buffer.
  4. The temporary buffer object goes out of scope and is eventually garbage collected.

This can potentially result in buffer copying and object churn on every I/O, which are exactly the sorts of things we'd like to avoid. However, depending on the implementation, things may not be this bad. The runtime will likely cache and reuse direct buffers or perform other clever tricks to boost throughput. If you're simply creating a buffer for one-time use, the difference is not significant. On the other hand, if you will be using the buffer repeatedly in a high-performance scenario, you're better off allocating direct buffers and reusing them.

Direct buffers are optimal for I/O, but they may be more expensive to create than nondirect byte buffers. The memory used by direct buffers is allocated by calling through to native, operating system-specific code, bypassing the standard JVM heap. Setting up and tearing down direct buffers could be significantly more expensive than heap-resident buffers, depending on the host operating system and JVM implementation. The memory-storage areas of direct buffers are not subject to garbage collection because they are outside the standard JVM heap.

The performance tradeoffs of using direct versus nondirect buffers can vary widely by JVM, operating system, and code design. By allocating memory outside the heap, you may subject your application to additional forces of which the JVM is unaware. When bringing additional moving parts into play, make sure that you're achieving the desired effect. I recommend the old software maxim: first make it work, then make it fast. Don't worry too much about optimization up front; concentrate first on correctness. The JVM implementation may be able to perform buffer caching or other optimizations that will give you the performance you need without a lot of unnecessary effort on your part.

Edwin Dalorzo
  • 76,803
  • 25
  • 144
  • 205
  • 12
    I don't like that quote because it contains too much guessing. Also, the JVM certainly does not need to allocate a direct ByteBuffer when doing IO for a non direct ByteBuffer: it's sufficient to malloc a sequence of bytes on the heap, do the IO, copy from the bytes to the ByteBuffer and release the bytes. Those areas could even be cached. But it is totally unnecessary to allocate a Java object for this. Real answers will only be obtained from measuring. Last time I did measurements there was no significant difference. I would have to redo tests to come up with all the specific details. – Robert Klemme Oct 17 '11 at 12:16
  • 4
    It is questionable if a book that describes NIO (and native operations) can have certainties in it. After all, different JVMs and operating systems manage things differently, so the author cannot be blamed for being unable to guarantee certain behavior. – Martin Tuskevicius Feb 05 '13 at 22:42
  • @RobertKlemme, +1, we all hate the guesswork, However, it may be impossible to measure performance for all major OSes, since there's just way too many major OSes. [Another post](http://goo.gl/X67Ot8) attempted that, but we can see [many many problems](http://goo.gl/K5dU9G) with it's benchmark, starting with "the results fluctuate widely depending on the OS". Also, what if there's a black sheep that do horrible stuff like buffer copying on every I/O? Then because of that sheep, we may be forced to prevent writing code we would otherwise use, *just* to avoid these worst-case scenarios. – Pacerier Aug 18 '14 at 07:05
  • @RobertKlemme I agree. There is far too much guesswork here. The JVM is vanishingly unlikely to allocate byte arrays sparsely, for example. – user207421 Sep 11 '15 at 23:30
  • @Edwin Dalorzo : Why do we need such byte buffer in real world? Are they invented as a hack to share memory between the process? Say for example JVM runs on a process and it would be another process which runs on network or data link layer - which is responsible for transmitting the data - are these byte buffers allocated to share memory between these processes? Please correct me if i'm wrong.. – Tom Taylor Jul 24 '18 at 19:18
29

There is no reason to expect direct buffers to be faster for access inside the jvm. Their advantage comes when you pass them to native code -- such as, the code behind channels of all kinds.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • Indeed. Like, when need to do IO in Scala/Java and call embedded Python / native libs with large in memory data for algorithmic processing or feed data directly to a GPU in Tensorflow. – SemanticBeeng Jun 02 '18 at 07:29
25

since DirectByteBuffers are a direct memory mapping at OS level

They aren't. They are just normal application process memory, but not subject to relocation during Java GC which simplifies things inside the JNI layer considerably. What you describe applies to MappedByteBuffer.

that it would perform quicker with get/put calls

The conclusion doesn't follow from the premiss; the premiss is false; and the conclusion is also false. They are faster once you get inside the JNI layer, and if you are reading and writing from the same DirectByteBuffer they are much faster, because the data never has to cross the JNI boundary at all.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 9
    This is a good and important point: on the path of IO you have to cross the Java - JNI border at *some* point. Direct and non direct byte buffers only move the border: with a direct buffer all put operations from Java land have to cross, while with a non direct buffer all IO operations have to cross. What is faster depends on the application. – Robert Klemme Oct 17 '11 at 12:22
  • 1
    @RobertKlemme Your summary is incorrect. With all buffers, any data coming to and from Java has to cross the JNI boundary. The point of direct buffers is that if you are just copying the data from one channel to another, e.g. uploading a file, you don't have to get it into Java at all, which is much faster. – user207421 Mar 06 '12 at 05:48
  • where exactly is my summary incorrect? And what "summary" to begin with? I was explicitly talking about "put operations from Java land". If you only copy data around between Channels (i.e. never have to deal with the data in Java land) that's a different story of course. – Robert Klemme Mar 21 '12 at 12:51
  • @RobertKlemme Your statement that 'with a direct buffer [only] all put operations from Java land have to cross' is incorrect. Both gets and puts have to cross. – user207421 Jun 16 '14 at 09:54
  • 1
    EJP, you're apparently still missing the intended distinction @RobertKlemme was making by choosing to use the words "put operations" in one phrase and using the words "IO operations" in the contrasted phrase of the sentence. In the latter phrase, his intention was to refer to operations between the buffer and an OS-provided device of some kind. – naki Jul 17 '17 at 23:32
21

Best to do your own measurements. Quick answer seems to be that sending from an allocateDirect() buffer takes 25% to 75% less time than the allocate() variant (tested as copying a file to /dev/null), depending on size, but that the allocation itself can be significantly slower (even by a factor of 100x).

Sources:

Community
  • 1
  • 1
Raph Levien
  • 5,088
  • 25
  • 24
  • Thanks. I would accept your answer but I'm looking for some more specific details regarding the differences in performance. –  Apr 15 '11 at 00:36