BUSTED How to speed up a byte[] lookup to be faster using sun.misc.Unsafe?

Question

I am experimenting with Unsafe to iterate over memory instead of iterating over the values in a byte[]. A memory block is allocated using unsafe. The memory is sufficient to hold 65536 byte values.

I AM TRYING THIS:

char aChar = some character

if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
 // do something
}

INSTEAD OF:

char aChar = some character

if ((byte) 0 == ( lookup[aChar] & mask )){
 // do something
}

I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...

It was only wishful thinking that the jvm would have a special op (unsafe) that would somehow make regular array access and iteration faster. The jvm, it seems to me, works fine with normal byte[] iterations and does them, fast as can be, using normal, unadulterated, vanilla java code.

@millimoose hits the proverbial 'nail on the head'

"Unsafe might be useful for a lot of things, but this level of microoptimisation isn't one of them. – millimoose"

Using Unsafe is faster in a very strict limited set of circumstances:

(64bit jvm only) faster for a single 65535 byte[] lookup done exactly once for each test. In this case UnsafeLookup_8B on 64_bit jvm is 24% faster. If the test repeats itself so that each test is done twice, the normal method is now 30% faster than unsafe. In pure interpreted mode on a cold jvm, the Unsafe is faster by far --- but only the first time and only for a small array size. ON a 32 bit standard Oracle JVM 7.x, the normal is three times faster than using unsafe.

Using Unsafe (in my tests) is slower:

slower on both Oracle java 64 bit and 32 bit virtual machines
slower regardless of OS and machine architecture (32 and 64 bit)
slower even if serverjvm option is invoked
Unsafe is slower from 9% or more ( 1_GB array and UnsafeLookup_8B(fastest one) in code below on 32 bit jvm (64bit was even slower??))
Unsafe is slower from 234% or more ( 1_MB array and UnsafeLookup_1B (fastest one) in code below on a 64 bit jvm.

Is there some reason for this?**

When I run the code yellowB posted below (checks a 1GB byte[]), the normal is also still the fastest:

C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!

use normalLookup()...
Not found '0'
time : 1967737 us.

use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.

use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.

Flat profile of 26.35 secs (2018 total ticks): main

  Interpreted + native   Method
  0.0%     1  +     0    test.StackOverflow.main
  0.0%     1  +     0    Total interpreted

     Compiled + native   Method
 67.8%  1369  +     0    test.StackOverflow.main
 11.7%   236  +     0    test.StackOverflow.unsafeLookup_8B
 11.2%   227  +     0    test.StackOverflow.unsafeLookup_1B
  9.1%   184  +     0    test.StackOverflow.normalLookup
 99.9%  2016  +     0    Total compiled

         Stub + native   Method
  0.0%     0  +     1    sun.misc.Unsafe.getLong
  0.0%     0  +     1    Total stub


Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 26.39 seconds:
100.0%  2023             Received ticks


C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)

CPU is: Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB (3.25GB usable) OS: Windows 7 (32)

Running the test on an 4 Core AMD64 with Windows 7_64, 32 bit java:

initialize data...
initialize data done!

use normalLookup()...
Not found '0'
time : 1631142 us.

use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.

use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.

Running the test on an 4 Core AMD64 with Windows 7_64, 64 bit java:

use normalLookup()...
Not found '0'
time : 655146 us.

use unsafeLookup_1B()...
Not found '0'
time : 904783 us.

use unsafeLookup_8B()...
Not found '0'
time : 764427 us.

Flat profile of 6.34 secs (13 total ticks): main

  Interpreted + native   Method
 23.1%     3  +     0    java.io.PrintStream.println
 23.1%     3  +     0    test.StackOverflow.unsafeLookup_8B
 15.4%     2  +     0    test.StackOverflow.main
  7.7%     1  +     0    java.io.DataInputStream.<init>
 69.2%     9  +     0    Total interpreted

     Compiled + native   Method
  7.7%     0  +     1    test.StackOverflow.unsafeLookup_1B
  7.7%     0  +     1    test.StackOverflow.main
  7.7%     0  +     1    test.StackOverflow.normalLookup
  7.7%     0  +     1    test.StackOverflow.unsafeLookup_8B
 30.8%     0  +     4    Total compiled


Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM

  Thread-local ticks:
100.0%     1             Blocked (of total)


Global summary of 6.35 seconds:
100.0%    14             Received ticks
 42.9%     6             Compilation

could you try using Xprof flag and paste the output here ( java -Xprof SampleClass > output.txt) — jknair, Sep 01 '12 at 10:18
You need to supply the code for your benchmark. It's entirely possible that you haven't done sufficient warmup for the JVM to JIT the Unsafe version. However, looking at the code supplied, I don't think that there will be any speed up. The unsafe will only provide an advantage if you are reading types wider than a byte, e.g. you want to read a single long rather than 8 individual bytes and apply adds and shifts to get the desired value. — Michael Barker, Sep 01 '12 at 18:00
You're expecting a method call to be faster than a simple array lookup. I'd expect any method call to be much slower than an array lookup unless it's inlined, so you might not be triggering the inline. — millimoose, Sep 20 '12 at 23:45
With the 64 bit os and jvm, use unsafeLookup_8B() one method call to unsafe replaces 8 array lookups. Yet it is still slower?? And the array being inspected has 1GB of bytes, not enough to trigger inlining? — The Coordinator, Sep 20 '12 at 23:55
@MichaelBarker Well, hypothethically, you could expect to get some tiny performance improvement from `Unsafe` not doing bounds checking even in the single-byte case. — millimoose, Sep 21 '12 at 00:09
@Wilf If I understand your test cases right, you're still comparing moving data between parts of memory to other ways of moving data between parts of memory. That doesn't seem to be what the class is meant to optimise away in the first place. Do note that all the `Unsafe` methods are declared as `native`. There might be some overhead to invoking native methods involved that the JVM can't optimise away. If you're willing to use native code in the first place to optimise things like this, it's probably better to avoid crossing the native-JVM boundary as little as necessary. — millimoose, Sep 21 '12 at 00:14
@millimoose After all the rage about using unsafe as a speed improvement, I thought the avoidance of bounds checking would make array access faster. Not so according to tests. And after all that code to invoke Unsafe, a waste of code and programming time! Better to KISS (keep it simple stupid) as they say! So, when would Unsafe offer a speed improvement. There is a Java serializer thatis fast and uses unsafe instead of Field reflection. I could see that being possibly the only case, where reflection is avoided for use of unsafe ... maybe? — The Coordinator, Sep 21 '12 at 00:53
@Wilf From searching around, I'm getting the impression that the answer is "not very often". Unsafe might be useful for a lot of things, but this level of microoptimisation isn't one of them. — millimoose, Sep 21 '12 at 01:15
@Wilf It also probably crashes the JVM a lot faster than an unhandled exception ;) — millimoose, Sep 21 '12 at 01:16
@Wilf Honestly I'm not sure what the point in doing this sort of microoptimisation would be. If you care about speed of a chunk of code THAT much, you should probably just rewrite the whole critical section in C. (Although I suppose that would be more difficult to deploy.) — millimoose, Sep 21 '12 at 01:18

yellowB · Answer 1 · 2012-09-19T16:10:41.797

I think the two functions you posted are basically the same because they read only 1 byte and then convert it to int and do the futher comparing.

Reading 4-Byte int or 8-Byte long every time is much more effective.I wrote two function to do the same thing:compare the content of two byte[] to see if they are the same:

function 1:

public static boolean hadoopEquals(byte[] b1, byte[] b2)
  {
    if(b1 == b2)
    {
      return true;
    }
    if(b1.length != b2.length)
    {
      return false;
    }
    // Bring WritableComparator code local

    for(int i = 0;i < b1.length; ++i)
    {
     int a = (b1[i] & 0xff);
     int b = (b2[i] & 0xff);
     if (a != b) 
     {
       return false;
     }
    }
    return true;
  }

function 2:

public static boolean goodEquals(byte[] b1,byte[] b2)
  {   
    if(b1 == b2)
    {
      return true;
    }
    if(b1.length != b2.length)
    {
      return false;
    }
    int baseOffset = UnSafe.arrayBaseOffset(byte[].class);

    int numLongs = (int)Math.ceil(b1.length / 8.0);

    for(int i = 0;i < numLongs; ++i)
    {
      long currentOffset = baseOffset + (i * 8);
      long l1 = UnSafe.getLong(b1, currentOffset);
      long l2 = UnSafe.getLong(b2, currentOffset);
      if(0L != (l1 ^ l2))
      {
        return false;
      }
    }
    return true;    
  }

I ran these two functions on my laptop(corei7 2630QM , 8GB DDR3 , 64bit win 7 , 64bit Hotspot JVM),and compare two 400MB byte[],result is below:

function 1: ~670ms

function 2: ~80ms

2 is much more faster.

So my suggestion is to read 8-byte every time and use the XOR operator(^):

long l1 = UnSafe.getLong(byteArray, offset);  //8 byte
if(0L == l1 ^ 0xFF)  //if the lowest byte == 0?
/* do something */
if(0L == l1 ^ 0xFF00)  //if the 2nd lowest byte == 0?
/* do something */
/* go on... */

============================================================================

Hi Wilf, I use your code to make a test class as below,this class compare the speed among 3 functions in looking up the 1st 0 in a byte array:

package test;

import java.lang.reflect.Field;

import sun.misc.Unsafe;

/**
 * Test the speed in looking up the 1st 0 in a byte array
 * Set -Xms the same as -Xms to avoid Heap reallocation
 * 
 * @author yellowb
 *
 */
public class StackOverflow
{
    public static Unsafe UnSafe;

    public static Unsafe getUnsafe() throws SecurityException,
            NoSuchFieldException, IllegalArgumentException,
            IllegalAccessException
    {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        Unsafe unsafe = (Unsafe) theUnsafe.get(null);
        return unsafe;
    }

    /**
     * use 'byte[index]' form to read 1 byte every time
     * @param buf
     */
    public static void normalLookup(byte[] buf)
    {
        for (int i = 0; i < buf.length; ++i)
        {
            if ((byte) 0 == buf[i])
            {
                System.out.println("The 1st '0' is at position : " + i);
                return;
            }
        }
        System.out.println("Not found '0'");
    }

    /**
     * use Unsafe.getByte to read 1 byte every time directly from the memory
     * @param buf
     */
    public static void unsafeLookup_1B(byte[] buf)
    {
        int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
        for (int i = 0; i < buf.length; ++i)
        {
            byte b = UnSafe.getByte(buf, (long) (baseOffset + i));
            if (0 == ((int) b & 0xFF))
            {
                System.out.println("The 1st '0' is at position : " + i);
                return;
            }

        }
        System.out.println("Not found '0'");
    }

    /**
     * use Unsafe.getLong to read 8 byte every time directly from the memory
     * @param buf
     */
    public static void unsafeLookup_8B(byte[] buf)
    {
        int baseOffset = UnSafe.arrayBaseOffset(byte[].class);

        //The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
        int numLongs = buf.length / 8;
        long currentOffset = 0L;
        for (int i = 0; i < numLongs; ++i)
        {
            currentOffset = baseOffset + (i * 8);  //the step is 8 bytes
            long l = UnSafe.getLong(buf, currentOffset);
            //Compare each byte(in the 8-Byte long) to 0
            //PS:x86 cpu is little-endian mode
            if (0L == (l & 0xFF))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8));
                return;
            }
            if (0L == (l & 0xFF00L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
                return;
            }
            if (0L == (l & 0xFF0000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
                return;
            }
            if (0L == (l & 0xFF000000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
                return;
            }
            if (0L == (l & 0xFF00000000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
                return;
            }
            if (0L == (l & 0xFF0000000000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
                return;
            }
            if (0L == (l & 0xFF000000000000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
                return;
            }
            if (0L == (l & 0xFF00000000000000L))
            {
                System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
                return;
            }
        }

        //If some rest bytes exists
        int rest = buf.length % 8;
        if(0 != rest)
        {
            currentOffset = currentOffset + 8;
            //Because the length of rest bytes < 8,we have to read them one by one
            for(; currentOffset < (baseOffset + buf.length); ++currentOffset)
            {
                byte b = UnSafe.getByte(buf, (long)currentOffset);
                if (0 == ((int) b & 0xFF))
                {
                    System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset));
                    return;
                }
            }
        }
        System.out.println("Not found '0'");
    }

    public static void main(String[] args) throws SecurityException,
            NoSuchFieldException, IllegalArgumentException,
            IllegalAccessException
    {
        UnSafe = getUnsafe();

        int len = 1024 * 1024 * 1024;  //1G
        long startTime = 0L;
        long endTime = 0L;

        System.out.println("initialize data...");
        byte[] byteArray1 = new byte[len];
        for (int i = 0; i < len; ++i)
        {
            byteArray1[i] = (byte) (i % 128 + 1);  //No byte will equal to 0
        }
        //If you want to set one byte to 0,uncomment the below statement
//      byteArray1[2500] = (byte)0;
        System.out.println("initialize data done!");

        System.out.println("use normalLookup()...");
        startTime = System.nanoTime();
        normalLookup(byteArray1);
        endTime = System.nanoTime();
        System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");

        System.out.println("use unsafeLookup_1B()...");
        startTime = System.nanoTime();
        unsafeLookup_1B(byteArray1);
        endTime = System.nanoTime();
        System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");

        System.out.println("use unsafeLookup_8B()...");
        startTime = System.nanoTime();
        unsafeLookup_8B(byteArray1);
        endTime = System.nanoTime();
        System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
    }
}

And the output is:

initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1271781 us.
use unsafeLookup_1B()...
Not found '0'
time : 716898 us.
use unsafeLookup_8B()...
Not found '0'
time : 591689 us.

the result shows that even reading 1 byte every time by Unsafe.getByte() is much faster than iterating the byte[] regularly.And reading 8-byte long is the fastest.

Very interesting. I will see what my benchmark shows. Thanks for the sample code and explanation. — The Coordinator, Sep 19 '12 at 10:58
I ran the test on a 32bit win7 + jRockit + intel Pentium E5300 + 4G,and result is:normalLookup() = 4628773 us,unsafeLookup_1B() = 5653817 us,unsafeLookup_8B() = 1638572 us. I seems unsafeLookup_1B() is slower than normalLookup() on 32bit old CPU? — yellowB, Sep 20 '12 at 00:56
I re-ran and repeated each test 50 times. (that makes 50 runs through each method). Surprise, the normal method is now the fastest. normalLookup = 29,492,009 us, unsafeLookup_1B = 43,727,754 us, unsafeLookup_8B = 37,804,323 us. So, it would seem that at some point the jvm does optimizations to the normal method and that runs the fastest. — The Coordinator, Sep 20 '12 at 04:54
It looks like, according to my tests, with a lot of array sizes and various iterations, that the normal method is still the fastest. — The Coordinator, Sep 20 '12 at 08:21
Could you pls list your hardware & software details? (cpu/memory/os...).In my situation,the unsafeLookup_8B is the fastest,both in 32bit mode and 64bit mode — yellowB, Sep 20 '12 at 09:46
CPU: Intel Core 2 Duo E4600 @ 2.4GHZ 4.00GB (3.25GB usable) OS: Windows 7 (32) JVM: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing) — The Coordinator, Sep 20 '12 at 13:44

score 1 · Answer 2 · answered Sep 21 '12 at 02:01

1

I thought Unsafe could access the memory faster than using a regular array access with the index check it does for each index...

One possible reason why the range checking might not be a factor is the JIT compiler's optimizer. Since the array's size never changes, it may be possible for the optimizer to "hoist" all of the range checking and perform it once at the start of the loop.

By contrast, the JIT compiler might be unable to optimize (e.g. inline) the Unsafe.getByte() call. Or maybe the getByte method has a read barrier ...)

However this is speculation. The way to be sure is to get the JVM to dump out the JIT-compiled native code for the two cases, and compare them instruction by instruction.

answered Sep 21 '12 at 02:01

Stephen C

698,415
94
811
1,216

Maybe someone can do this? Is this possible on a regular jvm on Windows? – The Coordinator Sep 21 '12 at 02:17
1

Yes it is possible to do this on Windows. https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly – Stephen C Sep 21 '12 at 05:38
Where can I find a recent openJdk7 for windows X86 or 64? – The Coordinator Sep 21 '12 at 06:27
The Java 7 JDK from the Oracle site should work just fine. You should be able to find it from the Oracle front page. – Stephen C Sep 21 '12 at 07:33
You need to get hold of the Intel (probably) manual that describes the native code instruction set, then using that 1) identify where the relevant code sections, and 2) read and compare them, and 3) analyse the theoretical execution times. This is difficult ... especially if you've never had to read native code before. – Stephen C Sep 22 '12 at 01:29
I'll be having open heart surgery before I do that! – The Coordinator Sep 26 '12 at 05:24

score 0 · Answer 3 · answered Dec 25 '12 at 13:39

Unsafe methods may be marked as native but that does not mean they are necessarily JNI. Almost all the Unsafe methods are intrinsics (see a short post here: http://psy-lob-saw.blogspot.co.uk/2012/10/java-intrinsics-are-not-jni-calls.html) for the Sun JVM they will be converted to a single assembly instruction(in many cases), for other JVMs they may or may not be as good at dealing with intrinsics and may convert them to JNI calls or plain java calls. From what I know JRockit is tends to go the JNI way, so does the Android JVM.

BUSTED How to speed up a byte[] lookup to be faster using sun.misc.Unsafe?

3 Answers3

Linked

**BUSTED** How to speed up a byte[] lookup to be faster using sun.misc.Unsafe?

3 Answers3

Linked

BUSTED How to speed up a byte[] lookup to be faster using sun.misc.Unsafe?