9

I wanted to compare performance of a direct byte buffer (java.nio.ByteBuffer, off-heap) and a heap buffer (achieved via array) for both read and writes. My understanding was, ByteBuffer being off-heap gets at least two benefits over a heap buffer. First, it won't be considered for GC and secondly (i hope i got it right) JVM won't use an intermediate/temporary buffer when reading from and writing to it. These advantages may make off-heap buffer faster than heap buffer. If that's correct, should I not expect my benchmark to show the same? It always shows heap-buffer faster than non-heap one.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 2, jvmArgs = {"-Xms2G", "-Xmx4G"})
@Warmup(iterations = 3)
@Measurement(iterations = 10)
public class BasicTest {

    @Param({"100000"})
    private int N;

    final int bufferSize = 10000;

    ByteBuffer byteBuffer = ByteBuffer.allocateDirect(8 * bufferSize);
    long buffer[] = new long[bufferSize];


    public static void main(String arep[]) throws  Exception {

        Options opt = new OptionsBuilder()
                .include(BasicTest.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();

    }


    @Benchmark
    public void offHeapBuffer(Blackhole blackhole) {

        IntStream.range(0, bufferSize).forEach(index -> {
            byteBuffer.putLong(index, 500 * index);
            blackhole.consume(byteBuffer.get(index));
        });

    }

    @Benchmark
    public void heapBuffer(Blackhole blackhole) {

        IntStream.range(0, bufferSize).forEach(index -> {
            buffer[index] = 500 * index;
            blackhole.consume(buffer[index]);
        });

    }
}

Run complete. Total time: 00:00:37

Benchmark (N) Mode Cnt Score Error Units

BasicTest.heapBuffer 100000 avgt 10 0.039 ± 0.003 ms/op

BasicTest.offHeapBuffer 100000 avgt 10 0.050 ± 0.007 ms/op

Community
  • 1
  • 1
Abidi
  • 7,846
  • 14
  • 43
  • 65
  • Hm, could well be that the absence of the intermediate/temporary buffer gives you a performance penalty. They didn't put it there to make everything slower, I'd guess. Just my personal 2 cents... – Curiosa Globunznik Nov 23 '19 at 14:55
  • 1
    Direct buffers work best when everything stays in the "native world". For instance, transferring bytes between two channels. If you pull the data into the "Java world" you lose a lot of the benefits. Might help: [When to use Array, Buffer or direct Buffer](https://stackoverflow.com/questions/18913001/); [ByteBuffer.allocate() vs. ByteBuffer.allocateDirect()](https://stackoverflow.com/questions/5670862/). – Slaw Nov 23 '19 at 14:59
  • Why would your benchmark show that a *direct* buffer is faster, when you don't do the operation where it *is* faster, e.g. read from / write to a file or socket? – Andreas Nov 23 '19 at 15:00
  • 1
    @curiosa The [javadoc](https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html) says: *"Given a **direct** byte buffer, the Java virtual machine will make a best effort to perform **native I/O** operations directly upon it. That is, it will attempt to **avoid copying** the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the **underlying operating system's** native I/O operations."* --- It is talking about reading/writing a file or socket. It wouldn't need to call the OS for plain memory access. – Andreas Nov 23 '19 at 15:08
  • @Abidi *"it won't be considered for GC"* Incorrect. Why do you believe that? And if it had been true, how would the memory ever be released? There is no method for you to control that. Just because the memory is outside the heap doesn't mean the actual deallocation of the memory is not performed by the garbage collector. – Andreas Nov 23 '19 at 15:16
  • I recommend https://codereview.stackexchange.com/. – FailingCoder Nov 23 '19 at 18:01

2 Answers2

9

It won't be considered for GC

Of course it will be considered for GC.

It is the Garbage Collector that determines that the buffer is no longer in use, and then deallocates the memory.

Should I not expect my benchmark to show [that] off-heap buffer [is] faster than heap buffer?

Being off-heap doesn't make the buffer faster for memory access.

A direct buffer will be faster when Java exchanges the bytes in the buffer with the operating system. Since your code is not doing I/O, there is no performance benefit to using a direct buffer.

As the javadoc says it:

Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • The javadoc link you provided also says "The contents of direct buffers may reside outside of the normal garbage-collected heap, and so their impact upon the memory footprint of an application might not be obvious". Wouldn't you infer that contents of the buffer I created above won't be garbage collected? – Abidi Nov 23 '19 at 15:43
  • @Abidi No, I wouldn't. It says "**reside** outside of the **normal** garbage-collected heap", i.e. the normal memory pools. Which means it is considered part of the **special** *(substitute word of choice)* garbage-collected heap. --- The point of that statement is the "impact upon the memory footprint" part, e.g. the memory is outside the limits set by `-Xmx`, and it probably doesn't change the values returned by `runtime.maxMemory()` and `runtime.totalMemory()`. – Andreas Nov 23 '19 at 15:48
  • Interesting, so you are saying, a GC would collect memory allocated through Xmx, ByteBuffer.directAllocate() and Unsafe.allocateMemory() methods? It's just that, memory allocated through last two methods is not part of Xmx? – Abidi Nov 23 '19 at 15:53
  • 1
    @Abidi Forget about `Unsafe`. Do not use it. It is undocumented! --- But if you have to use it, why do you think it's called "unsafe"? Why do you think it has a `freeMemory()` method? Because the memory returned by `allocateMemory()` is not under GC control. --- [Quote](http://mishadoff.com/blog/java-magic-part-4-sun-dot-misc-dot-unsafe/): *"Memory allocated [using `allocateMemory()`] is not located in the heap and **not under GC management**, so take care of it using `Unsafe.freeMemory()`. It also does not perform any boundary checks, so any illegal access may cause JVM crash."* – Andreas Nov 23 '19 at 15:59
  • Got you. So GC would run for memory allocated through allocateDirect() method. Another fellow seems to be claiming the same how I understood Javadocs initially: https://www.javacodegeeks.com/2013/08/which-memory-is-faster-heap-or-bytebuffer-or-direct.html – Abidi Nov 23 '19 at 16:04
  • 3
    @Abidi Just because it's on the web doesn't mean it's true. – Andreas Nov 23 '19 at 16:07
  • 4
    @abidi The memory allocated behind the `DirectByteBuffer` is not part of the heap. That means that the GC will not scan it or move it around the young/old generations. That's part of what makes it more efficient. (Note that the `DirectByteBuffer` object itself will be scanned and moved around like any other object.) As part of its "finalization", the `DirectByteBuffer` calls `freeMemory` to deallocate that memory. – Sotirios Delimanolis Nov 23 '19 at 16:10
  • I don't like my use of _efficient_ in that statement, but close enough. – Sotirios Delimanolis Nov 23 '19 at 16:13
  • @SotiriosDelimanolis Andreas mentioned, the contents of buffer allocated via allocateDirect() will be garbage collected, you are saying they won't be moved around by the GC. Do you mean they will be garbage collected but not the way objects in heap allocated via Xmx are collected? I understand about DirectByteBuffer object, since it was created via new keyword. – Abidi Nov 23 '19 at 16:17
  • @Abidi Objects on the heap are allocated out of a memory pool. The way GC works is that it empties a pool when it runs, *moving* any still-in-use objects to another pool. To the Java code, that's hidden, you don't see it, but since memory is not in a fixed memory location, accessing it requires a level of indirection, which affects performance. A *direct* buffer is outside the pools, so it is not moved around. GC is still responsible for *releasing* it. – Andreas Nov 23 '19 at 16:25
  • @Abidi If you're really this interested in this, you should take some time to *learn* how Java manages memory, including how GC works. It's low-level internal stuff, so very advanced, but it is interesting if you can understand it. – Andreas Nov 23 '19 at 16:29
  • @Andreas What if an object in heap is referenced by an object in heap allocated via allocateDirect(), will GC honour that cross memory pool reference? – Abidi Nov 23 '19 at 16:29
  • @Andreas seems like it, any good links/papers on this you can refer plz? – Abidi Nov 23 '19 at 16:30
  • @Abidi Huh?!? `allocateDirect()` allocates a `ByteBuffer`, and a `ByteBuffer` cannot reference anything else, so that [comment](https://stackoverflow.com/questions/59008751/direct-java-nio-bytebuffer-vs-java-array-performance-test/59009132?noredirect=1#comment104267471_59009132) makes no sense. – Andreas Nov 23 '19 at 16:31
  • A pedantic correction to the above comments -- a direct `ByteBuffer`'s storage is allocated outside of the GC'd heap (not within some other special GC'd heap), and is not itself garbage collected. Instead, a cleanup task associates a phantom reference to the `ByteBuffer` with a runnable that frees the off-heap data, and when the `ByteBuffer` becomes unreachable it will deallocate that data. No magic going on here, no second GC heaps, just plain old Java with a dash of off-heap memory management via `sun.misc.Unsafe`. The `ByteBuffer` instance itself is on the normal heap. – Score_Under Aug 15 '22 at 16:02
4

In JDK9, both HEAP and DIRECT buffers use the sun.misc.Unsafe for raw memory access. There is ZERO performance difference between the two other than HEAP buffers allocate faster. There used to be a big penalty for writing multiple-byte primitives to HEAP buffers but that is gone now.

When reading/writing from IO the HEAP buffer is slower because all the data MUST be first copied to a ThreadLocal DIRECT buffer before being copied into your HEAP buffer.

Both objects can be garbage-collected, the difference is that DirectByteBuffer use LESS of JVM HEAP memory whereas HeapByteBuffer store all memory on the JVM HEAP. The garbage-collection process for DirectByteBuffer is more complicated then HeapByteBuffer.

Johnny V
  • 795
  • 5
  • 14