11

I was recently playing with some benchmarks and found very interesting results that I can't explain right now. Here is the benchmark:

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, batchSize = 1000)
@Measurement(iterations = 10, time = 1, batchSize = 1000)
public class ArrayCopy {

    @Param({"1","5","10","100", "1000"})
    private int size;
    private int[] ar;

    @Setup
    public void setup() {
        ar = new int[size];
        for (int i = 0; i < size; i++) {
            ar[i] = i;
        }
    }

    @Benchmark
    public int[] SystemArrayCopy() {
        final int length = size;
        int[] result = new int[length];
        System.arraycopy(ar, 0, result, 0, length);
        return result;
    }

    @Benchmark
    public int[] javaArrayCopy() {
        final int length = size;
        int[] result = new int[length];
        for (int i = 0; i < length; i++) {
            result[i] = ar[i];
        }
        return result;
    }

    @Benchmark
    public int[] arraysCopyOf() {
        final int length = size;
        return Arrays.copyOf(ar, length);
    }

}

Result:

Benchmark                  (size)   Mode  Cnt       Score      Error  Units
ArrayCopy.SystemArrayCopy       1  thrpt   10   52533.503 ± 2938.553  ops/s
ArrayCopy.SystemArrayCopy       5  thrpt   10   52518.875 ± 4973.229  ops/s
ArrayCopy.SystemArrayCopy      10  thrpt   10   53527.400 ± 4291.669  ops/s
ArrayCopy.SystemArrayCopy     100  thrpt   10   18948.334 ±  929.156  ops/s
ArrayCopy.SystemArrayCopy    1000  thrpt   10    2782.739 ±  184.484  ops/s
ArrayCopy.arraysCopyOf          1  thrpt   10  111665.763 ± 8928.007  ops/s
ArrayCopy.arraysCopyOf          5  thrpt   10   97358.978 ± 5457.597  ops/s
ArrayCopy.arraysCopyOf         10  thrpt   10   93523.975 ± 9282.989  ops/s
ArrayCopy.arraysCopyOf        100  thrpt   10   19716.960 ±  728.051  ops/s
ArrayCopy.arraysCopyOf       1000  thrpt   10    1897.061 ±  242.788  ops/s
ArrayCopy.javaArrayCopy         1  thrpt   10   58053.872 ± 4955.749  ops/s
ArrayCopy.javaArrayCopy         5  thrpt   10   49708.647 ± 3579.826  ops/s
ArrayCopy.javaArrayCopy        10  thrpt   10   48111.857 ± 4603.024  ops/s
ArrayCopy.javaArrayCopy       100  thrpt   10   18768.866 ±  445.238  ops/s
ArrayCopy.javaArrayCopy      1000  thrpt   10    2462.207 ±  126.549  ops/s

So there are two strange things here:

  • Arrays.copyOf is 2 times faster than System.arraycopy for small arrays (1,5,10 size). However, on a large array of size 1000 Arrays.copyOf becomes almost 2 times slower. I know that both methods are intrinsics, so I would expect the same performance. Where does this difference come from?
  • Manual copy for a 1-element array is faster than System.arraycopy. It's not clear to me why. Does anybody know?

VM version: JDK 1.8.0_131, VM 25.131-b11

Boann
  • 48,794
  • 16
  • 117
  • 146
Dmitriy Dumanskiy
  • 11,657
  • 9
  • 37
  • 57
  • 2
    Since `copyOf` internally uses `arraycopy`, your benchmark is the problem. – Andreas Jun 11 '17 at 18:42
  • 3
    @Andreas You are not right. `Arrays.copyOf` is JVM intrinsic. Once the method is JIT-compiled, Java code in `Arrays.java` is not executed at all. – apangin Jun 11 '17 at 19:54
  • @Andreas Do you see any particular problem with the benchmark? It wisely uses [JMH framework](http://openjdk.java.net/projects/code-tools/jmh/) to avoid common benchmarking pitfalls. – apangin Jun 11 '17 at 19:56
  • @apangin That's a distinction without a difference. The same applies to any Java code. That doesn't make any arbitrary method a 'JVM intrinsic'. – user207421 Jun 11 '17 at 23:38
  • 5
    @EJP OK, I'll reword. JIT compiler does not look at the bytecode of `Arrays.copyOf`, because it has internal knowledge of what this method should do. – apangin Jun 12 '17 at 00:27

1 Answers1

8

Your System.arraycopy benchmark is not semantically equivalent to Arrays.copyOf.

It will be if you replace

    System.arraycopy(ar, 0, result, 0, length);

with

    System.arraycopy(ar, 0, result, 0, Math.min(ar.length, length));

With this change, the performance of both benchmarks will also become similar.

Why is the first variant slower then?

  1. Without knowing how length relates to ar.length JVM needs to perform additional bounds check and be prepared to throw IndexOutOfBoundsException when length > ar.length.

  2. This also breaks the optimization to eliminate redundant zeroing. You know, every allocated array must be initialized with zeros. However, JIT can avoid zeroing if it sees that the array is filled right after creation. But -prof perfasm clearly shows that the original System.arraycopy benchmark spends a significant amount of time clearing the allocated array:

     0,84%    0x000000000365d35f: shr    $0x3,%rcx
     0,06%    0x000000000365d363: add    $0xfffffffffffffffe,%rcx
     0,69%    0x000000000365d367: xor    %rax,%rax
              0x000000000365d36a: shl    $0x3,%rcx
    21,02%    0x000000000365d36e: rep rex.W stos %al,%es:(%rdi)  ;*newarray
    

Manual copy appeared faster for small arrays, because unlike System.arraycopy it does not perform any runtime calls to VM functions.

hagrawal7777
  • 14,103
  • 5
  • 40
  • 70
apangin
  • 92,924
  • 10
  • 193
  • 247
  • 1
    Aren't `ar.length` and `length` the same? What effect does `Math.min(ar.length, length)` have? – Boann Jun 11 '17 at 21:17
  • @Boann They are the same, but JVM does not know this. `Math.min` lets the compiler know that `System.arraycopy` will never throw `IndexOutOfBoundsException`. – apangin Jun 11 '17 at 21:21
  • Please explain. The compiler doesn't know `System.arraycopy()` from a hole in the ground, other than its name, calling sequence, result, and `throws` clause (which in fact it doesn't have). `System.arraycopy()` just receives the result of `Math.min()` in its fifth argument, and does not know how it was derived. – user207421 Jun 11 '17 at 23:23
  • 5
    @EJP I'm not sure what compiler you mean, but HotSpot JIT compiler (actually, both of them - C1 and C2) definitely [knows](http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/68758c5ab0c1/src/share/vm/classfile/vmSymbols.hpp#l730) about `System.arraycopy()` and [translates](http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/68758c5ab0c1/src/share/vm/opto/library_call.cpp#l4779) the call to a graph of IR nodes. [Data-flow analysis](https://en.wikipedia.org/wiki/Data-flow_analysis) then helps to reduce this graph. – apangin Jun 12 '17 at 00:53
  • @apangin thanks! Do you know why `copyOf` becomes slower on larger arrays? – Dmitriy Dumanskiy Jun 12 '17 at 23:37
  • @DmitriyDumanskiy It does not. I cannot reproduce this effect. On all systems I've tested `arraysCopyOf` is consistently faster than `SystemArrayCopy`. – apangin Jun 13 '17 at 00:50
  • Interesting. I'll check again. – Dmitriy Dumanskiy Jun 13 '17 at 06:53
  • @apangin The 'compiler I mean' is the Java compiler. Obviously. The thing you confusingly now refer to as the 'HotSpot JIT compiler' uses 'JIT' terminology which disappeared nearly twenty years ago, at Java 1.3, to be *replaced* by the HotSpot *JVM.* – user207421 Jun 15 '17 at 11:15
  • 1
    @EJP Who made the term disappear? 'JIT' is still officially used at [HotSpot website](http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html), [bug tracker](https://bugs.openjdk.java.net/), [OpenJDK mailing lists](http://mail.openjdk.java.net) and in [public presentations](http://cr.openjdk.java.net/~vlivanov/talks/2015_JIT_Overview.pdf) by Oracle engineers. – apangin Jun 15 '17 at 11:57
  • @EJP JVM is much more than a compiler. JIT compiler is just one part of it. C1, C2 and Graal are particular examples of JIT compilers (also known as dynamic compilers) that work in HotSpot JVM. – apangin Jun 15 '17 at 12:01