Why DirectByteBuffer have higher allocation and deallocation costs?

Question

Mentioned in the java docs of ByteBuffer, a direct buffer allocated by allocateDirect has higher allocation and deallocation costs.

A direct byte buffer may be created by invoking the allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers.

I wonder that why there are higher costs? What things need to do under the hood?

I can understand the higher costs of MappedByteBuffer , due to the costs of mmap. But the DirectByteBuffer, in my opinion, is only a block of memory in the JVM process heap, the same as GC heap. Both them are in the JVM process heap.

I don't know what's the difference to allocate or deallocate a block of memory inside or outside GC heap.

*"... in my opinion, is only a block of memory in the JVM process, the same as java heap"* - Did you check whether your opinion is correct before asking? The OpenJDK source code is available for all to read. (If you want to understand how the JVM works under the hood, the best way is to read the source code. That is preferable to ... in effect ... asking someone else to read it for you.) — Stephen C, Jul 31 '23 at 03:23

score 4 · Accepted Answer · answered Jul 31 '23 at 03:31

Depending on a ton of factors it's more likely to result in an explicit malloc call than just .allocate, though it depends on the OS+Arch and it may well be an alias for .allocate.

Some context and detail in case that answer isn't particularly satisfying:

See this answer for some context.

Ordinarily, objects in java (specifically, byte arrays, which is the underlying data store for your basic HeapByteBuffer, made by .allocate), have the following properties:

They are created and remain in the JVM heap memory.
The garbage collector will inspect them and keep track of them, and will move them around.
Arrays don't actually have to be contiguous. The JVM will try but nothing in the spec says that the JVM must keep the bytes together. Again the garbage collector comes into play somewhat: If there are 2 large segments of free memory in the heap and you allocate a byte array somewhat larger than either slice, the GC would either have to move it all, or throw an exception, or allow the array to be 'split' to avoid having to move stuff around. I'm not sure any JVMs exist that really do this.
The memory is also part of the JVM's process; ordinarily as far as the OS is concerned the entire heap (whether they contain live objects or not) is considered 'used memory' by the underlying OS.

With .allocateDirect, you break a bunch of those rules:

They are (probably) created separate; exists outside of the heap (e.g. would be in addition to -Xmx, the parameter to set max heap size, as an example).
Presumably this always causes an OS-level malloc call, to ask the OS for a contiguous chunk of ram. malloc calls can take time.
The block really is contiguous because malloc made it.
The block is never moved around.

Note that there are some gaps in the theory here - if it's explicitly malloced, then it should be explicitly freed, and yet ByteBuffer neither has a deallocate() method nor is it an (Auto)Closable. Also, the javadoc reserves lots of rights; for example, .allocateDirect might do nothing different from allocate - the JVM is free to use the heap or not. These direct OS-level interactions tend to do that: Java has to run on a wide array of OS+Architecture combos. If some OS+arch combo doesn't have direct buffers, what now? Should .allocateDirect fail-fast (throw an exception)? That would be sensible, except, only if the spec locks down specifically what a direct bytebuffer guarantees you, and therein lies a problem, because there is tons of variation across OS+Arch combos of what it really means.

"Does not move around" is a requirement for various low-level I/O OS kernel calls (where you tell the kernel: Please just tell the network hardware in the system to directly copy incoming bytes straight into this memory block - that kind of low-level I/O. Not all OSes support it, and not all support it in the same way). A plain jane heap buffer simply can't use that; if the underlying OS does support it and so does the JVM, the JVM has to make its own direct buffer (outside of the heap / in a section cordonned off from the GC to ensure it does not move), and start a separate process to blit those bytes into your buffer, taking into account the GC system as that can move around. In contrast, if you ask e.g. a FileChannel object to copy bytes from a file to your direct buffer, it might be possible that the native impl backing FileChannel of your JVM will just tell the OS to tell the SSD to directly do so with no interaction from the OS/CPU whatsoever. Some hardware can do that. But only to 'fixed' memory locations.

Whether your JVM can actually do all that - no guarantees. But if it can, it can only do that if you make a direct buffer.

In your bullet point "Arrays don't actually..." does the final "this" refer to splitting the array? — tgdavies, Jul 31 '23 at 05:37
@tgdavies yes. Do you think the answer should be updated to make that more clear? I can edit if it'll help :) — rzwitserloot, Jul 31 '23 at 12:34

score 0 · Answer 2 · answered Jul 31 '23 at 05:53

0

Allocating a non-direct ByteBuffer can be fully optimized by the garbage collector, and it so it's potentially more efficient than allocating a direct ByteBuffer. In practice, the difference might not be that significant.

The bigger issue is deallocation. Both types of ByteBuffers can only be deallocated via garbage collection, but direct ByteBuffers typically require a more expensive collection of the old generation. Using short-lived direct ByteBuffers is therefore not advised, since it increases GC load.

Note that there is an unsupported way of explicitly deleting direct ByteBuffers, but this capability will go away at some point, some time after the java.lang.foreign API is fully released. MemorySegments should then be used instead if you need short-lived off-heap buffers.

answered Jul 31 '23 at 05:53

boneill

1,478
1
11
18

Thanks for your answer! There is one thing i not really understand. Could you please explain the reason why “direct ByteBuffers require a more expensive collection of the old generation.” – light Aug 01 '23 at 06:37
GC activity is triggered by memory usage, but off heap memory isn't explicitly known by the GC. Without special actions, direct ByteBuffers will cause out of memory errors. To compensate, a special method is called which can call System.gc(), which forces a full collection: https://github.com/openjdk/jdk/blob/ee3e0917b393b879a543060ace2537be84f20e82/src/java.base/share/classes/java/nio/Bits.java#L106 – boneill Aug 01 '23 at 13:38

Why DirectByteBuffer have higher allocation and deallocation costs?

2 Answers2