11

BACKGROUND

Assume I have a direct ByteBuffer:

ByteBuffer directBuffer = ByteBuffer.allocateDirect(1024);

and assume I am passing the buffer to an AsynchronousSocketChannel to read chunks of data off that socket up to X bytes at a time (1024 in the example here).

The transfer time off the socket into the direct ByteBuffer is fantastic because it is all occurring in native OS memory space; I haven't passed through the JVM "blood-brain" barrier yet...

QUESTION

Assuming my job is to scan through all the bytes read back in from the direct byte buffer, what is the fastest way for me to do this?

I originally asked "... utilizing sun.misc.Unsafe" but maybe that is the wrong assumption.

POSSIBLE APPROACHES

I currently see three approaches and the one I am most curious about is #3:

  1. (DEFAULT) Use ByteBuffer's bulk-get to pull bytes directly from native OS space into an internal byte[1024] construct.
  2. (UNSAFE) Use Unsafe's getByte ops to pull the values directly out of the ByteBuffer skipping all the bounds-checking of ByteBuffer's standard get ops. Peter Lawrey's answer here seemed to suggest that those raw native methods in Unsafe can even be optimized out by the JIT compiler ("intrinsics") to single machine instructions leading to even more fantastic access time. (===UPDATE=== interesting, it looks like the underlying DirectByteBuffer class does exactly this with the get/put ops for those interested.)
  3. (BANANAS) In some crime-against-humanity sort of way, using Unsafe, can I copy the memory region of the direct ByteBuffer to the same memory address my byte[1024] exists at inside of the VM, and just start accessing the array using standard int indexes? (This makes the assumption that the "copyMemory" operation can potentially do something fantastically optimized at the OS level.

It does occur to me that assuming the copyMemory operation does exactly what it advertises, even in the more-optimal OS space, that the #2 approach above is probably still the most optimized since I am not creating duplicates of the buffer before beginning to process it.

This IS different than the "can I use Unsafe to iterate over a byte[] faster?" question as I am not even planning on pulling the bytes into a byte[] internally if it isn't necessary.

Thanks for the time; just curious if anyone (Peter?) has gotten nuts with Unsafe to do something like this.

Community
  • 1
  • 1
Riyad Kalla
  • 10,604
  • 7
  • 53
  • 56

1 Answers1

1

ByteBuffer methods are extremely fast, because these methods are intrinsics, VM has mapped them to very low level instructions. Compare these two approaches:

    byte[] bytes = new byte[N];
    for(int m=0; m<M; m++)
        for(int i=0; i<bytes.length; i++)
            sum += bytes[i];

    ByteBuffer bb = ByteBuffer.allocateDirect(N);
    for(int m=0; m<M; m++)
        for(int i=0; i<bb.remaining(); i++)
            sum += bb.get(i);

on my machine, the difference is 0.67ns vs 0.81ns (per loop).

I'm a little surprised that ByteBuffer is not as fast as byte[]. But I think you should definitely NOT copy it to a byte[] then access.

ZhongYu
  • 19,446
  • 5
  • 33
  • 61
  • Didn't know about the "intrinsic" property of the ByteBuffer methods; do you mean the 'native' methods on the DirectBuffer class specifically? In that case, that would do exactly what #2 would do in my post above so that is great news. – Riyad Kalla Aug 14 '13 at 02:22
  • @RiyadKalla intrinsic != native. Intrinsic methods are "hardcoded" in the JVM. – assylias Aug 14 '13 at 07:17
  • @assylias I understand; I was (incorrectly) referring to what I figured were 'native' methods on the DirectByteBuffer impl class (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/DirectByteBuffer.java) but now that I look at it, I see that there are no 'native' methods, it simply leverages Unsafe to do those ops. I mispoke, thank you for the catch. – Riyad Kalla Aug 14 '13 at 17:13
  • It is Unsafe that is treated as an intrinsic, rather than ByteBuffer et.al. See for example https://github.com/airlift/slice for an alternate and really fast implementation of buffers in Java – juancn Nov 18 '13 at 20:39
  • Benchmark nitpicking: Did you do something useful (like printing out) the value of `sum`? Because else the JIT will optimize it away. I think 0.67ns and 0.81ns per loop is very fast, maybe it really is that fast, but I'm a little bit curious about it. – skiwi May 23 '14 at 18:03
  • @skiwi sure, I printed `sum` out. (though a *really* smart vm would figure out that it's `0` anyway:) – ZhongYu May 28 '14 at 20:55