35

It's a shame for me, but I did not know that:

You should use clone to copy arrays, because that's generally the fastest way to do it.

as Josh Bloch states in this blog: http://www.artima.com/intv/bloch13.html

I always used System.arraycopy(...). Both approaches are native, so probably without going deeper into the sources of libraries I can not figure out, why it is so.

My question is simple: why is it the fastest way? What is the difference with System.arraycopy? The difference is explained here, but it does not answer the question why Josh Bloch considers clone() as the fastest way.

bcsb1001
  • 2,834
  • 3
  • 24
  • 35
Andremoniy
  • 34,031
  • 20
  • 135
  • 241
  • 1
    15 years have passed since that post was written. i can see how maybe the clone() method would bypass a calloc() call and an array bounds check, but i would write a jmh test and see for myself – radai Sep 10 '17 at 21:46
  • May be @JoshuaBloch will come here and explain it :-) – Andremoniy Sep 10 '17 at 21:58
  • 1
    Both `clone` and `Arrays.copyOf` will be faster than `System.arraycopy`, *if* you're creating and filling a new array because the former 2 methods can avoid the implicit zero-initialization when creating an array with `new`. Although not specifically about this particular problem, [this blog post](https://shipilev.net/blog/2016/arrays-wisdom-ancients/) has a lot of related information. I'm pretty sure we have Q&As here on SO which cover this but I'm having trouble finding one. – Radiodef Sep 10 '17 at 22:02
  • well, until he does, here's the jdk7 source for clone (JVM_Clone) - http://hg.openjdk.java.net/jdk7/jdk7/hotspot/file/tip/src/share/vm/prims/jvm.cpp. the same source file also contains JVM_ArrayCopy. maybe someone more knowledgeable can spot the reason – radai Sep 10 '17 at 22:04
  • I found some answers about this problem - https://stackoverflow.com/questions/2589741/what-is-more-efficient-system-arraycopy-vs-arrays-copyof, https://stackoverflow.com/questions/7179251/is-there-any-reason-to-prefer-system-arraycopy-over-clone – egorlitvinenko Sep 21 '17 at 08:51
  • @egorlitvinenko this is not related: there `Arrays.copyOf` vs `System.arrayCopy` are discussed, while I am asking about array's `clone()` method – Andremoniy Sep 21 '17 at 08:53
  • @Andremoniy not only Arrays.copyOf. – egorlitvinenko Sep 21 '17 at 08:53
  • @egorlitvinenko I see there only one answer about `clone()` method which doesn't reveal why it is faster – Andremoniy Sep 21 '17 at 08:54
  • I don't want say that it is duplicate. I just note that there a lot of discussions on SO about it, I didn't copy-paste all search results from google. :) – egorlitvinenko Sep 21 '17 at 09:11

6 Answers6

31

I would like to make some points about why clone() is the fastest way to copy an array than System.arraycopy(..) or others:

1. clone() doesn't have to do the typechecking before copying a source array to the destination one as provided here. It just simple allocates new memory space and assigns the objects to it. On the other hand, System.arraycopy(..) checks for the type and then copies an array.

2. clone() also breaks the optimization to eliminate redundant zeroing. As you know, every allocated array in Java must be initialized with 0s or respective default values. However, JIT can avoid zeroing this array if it sees that the array is filled right after creation. That makes it definitely faster compared to changing the copy values with existing 0s or respective default values. While using System.arraycopy(..) spends significant amount of time clearing and copying the initialized array. To do so I have performed some of the benchmark tests.

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, batchSize = 1000)
@Measurement(iterations = 10, time = 1, batchSize = 1000)
public class BenchmarkTests {

    @Param({"1000","100","10","5", "1"})
    private int size;
    private int[] original;

    @Setup
    public void setup() {
        original = new int[size];
        for (int i = 0; i < size; i++) {
            original[i] = i;
        }
    }

    @Benchmark
    public int[] SystemArrayCopy() {
        final int length = size;
        int[] destination = new int[length];
        System.arraycopy(original, 0, destination, 0, length);
        return destination;
    }


    @Benchmark
    public int[] arrayClone() {
        return original.clone();
    }

}

Output:

Benchmark                        (size)   Mode  Cnt       Score      Error  Units
ArrayCopy.SystemArrayCopy            1  thrpt   10   26324.251 ± 1532.265  ops/s
ArrayCopy.SystemArrayCopy            5  thrpt   10   26435.562 ± 2537.114  ops/s
ArrayCopy.SystemArrayCopy           10  thrpt   10   27262.200 ± 2145.334  ops/s
ArrayCopy.SystemArrayCopy          100  thrpt   10   10524.117 ±  474.325  ops/s
ArrayCopy.SystemArrayCopy         1000  thrpt   10     984.213 ±  121.934  ops/s
ArrayCopy.arrayClone                 1  thrpt   10   55832.672 ± 4521.112  ops/s
ArrayCopy.arrayClone                 5  thrpt   10   48174.496 ± 2728.928  ops/s
ArrayCopy.arrayClone                10  thrpt   10   46267.482 ± 4641.747  ops/s
ArrayCopy.arrayClone               100  thrpt   10   19837.480 ±  364.156  ops/s
ArrayCopy.arrayClone              1000  thrpt   10    1841.145 ±  110.322  ops/s

According to the outputs I am getting that clone is almost twice fast from System.arraycopy(..)

3. Also, using manual copying method like clone() results into faster ouput because it doesn't have to make any VM calls (unlike System.arraycopy()).

Procrastinator
  • 2,526
  • 30
  • 27
  • 36
  • 2
    Any proof links would be helpful. Without them these statements sounds just as conjectures. Also, any micro benchmark testings which will prove it also will be a proof. – Andremoniy Sep 15 '17 at 10:30
  • 3
    ArrayCopy.arrayClone=26324 **ops/s**, ArrayCopy.SystemArrayCopy=55832 **ops/s** - how this data can lead to conclusion that "clone is almost twice fast from System.arraycopy()"? – Oleg Estekhin Sep 17 '17 at 08:36
  • 1
    arrayCopy does not clear any memory. arrayCopy can be used to shift data in the same array forward or backward, in fact. I agree that is what your test actually do, but it is new int[] that zeros the array, not arrayCopy. – Sheepy Sep 17 '17 at 10:48
  • @OlegEstekhin. sorry that was a pasting mistake from an output. Now checked and edited the results. :) Thanks for pointing it out. – Procrastinator Sep 18 '17 at 06:51
  • 1
    That is one mightly mistake to make when claiming some performance results. Are you sure there no other mistakes in your benchmark? Have you tried a variant of `System.arraycopy` where you use `original.length` instead of accessing `size` field? Why have you used `batchSize` configuration parameter? Do you have an _explanation_ of the _data_ you got from the benchmark? – Oleg Estekhin Sep 18 '17 at 07:37
  • Yes, the variant with `System.arraycopy(original, 0, destination, 0, Math.min(original.length, length));`. Using this variant the result becomes similar to `arrayClone`. The reason behind the presented result is different because, Without knowing how length relates to `original.length` JVM needs to perform additional bounds check and be prepared to throw `IndexOutOfBoundsException` when `length > original.length`. – Procrastinator Sep 18 '17 at 07:52
  • It is not the reason of results in your benchmark, you don't measure fail cases. – egorlitvinenko Sep 18 '17 at 10:29
  • Also, you should read, that it is bad - do only one fork for benchmarks, results could be very different. – egorlitvinenko Sep 18 '17 at 10:30
  • 1
    Are you sure there's type checking needed at runtime with `arrayCopy`? I'm pretty sure the compiler can resolve which overloaded variant to call at compile time. Moreover, `clone()` requires a virtual dispatch, which is cheap but not free. The point about zeroing overhead is excellent and explains all the variation in the results. But `arrayCopy()` can do things `clone()` can't. E.g. if you are repeatedly moving data around in the same memory (say for example sorting raw binary data), then `clone()` doesn't help. – Gene Sep 19 '17 at 02:36
  • @Gene So far what I have developed and read about `arrayCopy()` it says so. Definitely compiler is able to overcome this variant call at runtime but in most cases it doesn't do automatically. Also, regarding the last two statements its also correct but here I just focused on `clone()` is faster than `arrayCopy()`. Definitely `clone()` doesn't provide a good memory efficient solution but a faster one. – Procrastinator Sep 19 '17 at 07:14
  • @procrastinator (but in most cases) Where did you read this? In my practice in all cases when we can use Object.clone, but use System.arraycopy, compiler can and do optimization. Provide sources or create representative benchmark, please. – egorlitvinenko Sep 19 '17 at 09:21
  • 1
    @Gene Certainly, see the [Javadoc](https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#arraycopy(java.lang.Object,%20int,%20java.lang.Object,%20int,%20int)). It is astonishing that so many people are prepared to make these counter-assertions without having even consulted it. – user207421 Sep 21 '17 at 03:01
  • @EJP you should note that if we see something in Java doc it does not mean, that this method really does this each calling time. Because HotSpot is adaptive to your application and generates intrinsic's implementation for particular scenarios. – egorlitvinenko Sep 21 '17 at 06:15
  • For me, it is astonishing that test with typical errors (don't understand what it measured https://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array and make test in one fork), and which don't have any explanation and evidence, besides javadoc, inspires such trust. For me it looks like creator don't good enough understand how does hotspot work. – egorlitvinenko Sep 21 '17 at 07:56
  • @EJP what do you think, is this benchmark test correct? – Andremoniy Sep 21 '17 at 08:12
  • @procrastinator I have received different results than your table. It shows me, that `System.arrayCopy` is faster on ~30% on copying array of 1000 elements, and twice slower when copying array containing only 1 element. – Andremoniy Sep 21 '17 at 08:27
  • Though for an array of size 10000 the results are roughly equal – Andremoniy Sep 21 '17 at 08:39
  • @Andremoniy. I am getting some slower results for `clone()` than `System.arrayCopy()` as well, in case when I have the array size of more than 5000 elements. But they are also like approx ~15%. And it becomes again same or little faster with size of elements more than 10000. – Procrastinator Sep 21 '17 at 08:40
  • @Andremoniy it is interesting, could you try other benchmarks too? What would be results (one PC better then one). – egorlitvinenko Sep 21 '17 at 08:44
  • @procrastinator why do you not answer on questions? Could you comment anything about sorted array, one fork, explanation of results (what of 1,2,3 impact on performance and how) etc.? I fill like you deliberately ignoring this questions, am I wrong? – egorlitvinenko Sep 21 '17 at 09:03
  • @procrastinator Could you add perfasm results, which you received? – egorlitvinenko Sep 21 '17 at 09:16
  • @egorlitvinenko here are the results that I am getting: `0,94% 0x000000000365d35f: shr $0x3,%rcx 0,06% 0x000000000365d363: add $0xfffffffffffffffe,%rcx 0,59% 0x000000000365d367: xor %rax,%rax 0x000000000365d36a: shl $0x3,%rcx 24,17% 0x000000000365d36e: rep rex.W stos %al,%es:(%rdi) ;*newarray` – Procrastinator Sep 21 '17 at 09:38
  • @procrastinator Where do you see deallocation process here? I see 24,17% `0x000000000365d36e: rep rex.W stos %al,%es:(%rdi) ;*newarray` of allocation process. And this is exactly what I tried to show in my benchmark. That allocation is reason why clone is faster, because simple newarray in java create and initialize objects. I also found question close to this on SO - https://stackoverflow.com/questions/32834869/system-arraycopy-with-constant-length. More over in those question you see explanation with L-caches, this is start to understand why is your benchmark is victim of branch prediction. – egorlitvinenko Sep 21 '17 at 10:11
  • @egorlitvinenko and which is what I mentioned in 2nd point of my answer. `clone()` does not require any (re)allocation to an array in advance but `System.arrayCopy()` does need it. And thanks for pointing out that branch prediction issue. I'll take a look into it and possibly change the results. – Procrastinator Sep 21 '17 at 10:18
  • 1
    Not -objects- elements. I saw your second point, it was clear for me. But I have questions about first and third points. I'm glad that you try to clarify your answer, thank you. @EJP just to notify about it. – egorlitvinenko Sep 21 '17 at 10:22
  • @egorlitvinenko looking at your previous comment I thought it was unclear about 2nd point. Good that its clear now. – Procrastinator Sep 21 '17 at 10:31
5

For one thing, clone() doesn't have to do the typechecking that System.arraycopy() does.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • If we just disable ReduceBulkZeroing option, there would be no difference between Object.clone and System.arraycopy. Both method checks is it call for primitive array or not and do copy in similar way. See my answer. If you have another proving benchmarks for your assumption, please, share. – egorlitvinenko Sep 18 '17 at 12:06
  • 1
    @egorlitvinenko There is no 'assumption' here, just the verifiable fact that `System.arraycopy()` is obliged by its contract to do type checking: for example if you try to copy an `Object[]` to a `String[]`. There is nothing in either the question or my answer about `ReduceBulkZeroing`, whatever that may have to do with it. – user207421 Sep 18 '17 at 12:34
  • How this verifiable fact could be verified? From my side I could say this is true only if we create wrapper method with Object[] argument, which would be passed to System.arraycopy and we will call this method with different source arrays. Otherwise it would be inlined without typecheking. I tried create a benchmark, which is used AtomicLong instead of long, disable ReduceBulkZeroing and see this result https://pastebin.com/ufxCZVaC. I don't see any impact from "typechecking". I'm sure (to see inlinings -XX:+PrintInlining), this is just because inlining, on runtime there are no typechecks. – egorlitvinenko Sep 18 '17 at 12:50
  • @egorlitvinenko You could start by reading the [Javadoc](https://docs.oracle.com/javase/7/docs/api/java/lang/System.html#arraycopy(java.lang.Object,%20int,%20java.lang.Object,%20int,%20int)), specifically the part about when `ArrayStoreException` is thrown. – user207421 Sep 21 '17 at 02:55
  • Do you can clone it? If you try to clone `Object[]` and assign it to `String[]` with type cast you will receive `ClassCastException`, even if all elements are Strings. Could you provide an example what do you mean? Let understand me right, of course, I read javadoc and more than once. I don't consider all cases of System.arraycopy, I consider only the same cases, when Object.clone could be used. – egorlitvinenko Sep 21 '17 at 06:21
  • 1
    Sigh. `Object[] obj = {new Object()}; String[] str = new String[obj.len]; System.arraycopy(obj, 0, str, 0, str.len);` This will throw `ArrayStoreException`, as it says in the Javadoc, which you have still evidently not read, which it cannot do unless it has some typechecking in it, which is what it says in my answer. At a minimum it has to check that the source and target arrays are of the same type, and if they aren't it has to typecheck every element being copied. – user207421 Sep 21 '17 at 07:45
  • Patience. :) So, what? (String[])obj.clone() you can't do at this case too. If you compare Object[].clone and System.arraycopy (String), you compare different things and different scenarious. If you write benchmark for this it would be typical benchmark error, because you try compare not comparable things. – egorlitvinenko Sep 21 '17 at 07:51
  • @egorlitvinenko Exactly my point. They are different, and `System.arraycopy()` has to do more typechecking, which is why it isn't superior. You can't write a benchmark for invalid code, but if you write it for valid code you will find that `clone()` is faster, as the OP has already done. You are arguing in circles, and with Joshua Bloch. – user207421 Jun 09 '21 at 01:31
5

I want to correct and complement previous answers.

  1. Object.clone uses unchecked System.arraycopy implementation for arrays;
  2. The main performance improvement of Object.clone, it is initialization of RAW memory directly. In the case of System.arraycopy it also tries to combine array initialization with copy operation, as we can see in source code, but it also does different additional checks for this, unlike Object.clone. If you just disable this feature (see below), then performance would be very closer (in particularly on my hardware).
  3. One more interesting thing is about Young vs Old Gen. In case when source array aligned and inside Old Gen, both methods have close performance.
  4. When we copy primitive arrays System.arraycopy always uses generate_unchecked_arraycopy.
  5. It depends from hardware/OS dependent implementations, so don't trust benchmarks and assumptions, check on you own.

Explanation

First of all clone method and System.arraycopy are intrinsics. Object.clone and System.arraycopy use generate_unchecked_arraycopy. And if we go deeper we could see that after that HotSpot select concrete implementation, dependent from OS, etc.

Longly. Let's see the code from Hotspot. First of all we will see that Object.clone (LibraryCallKit::inline_native_clone) uses generate_arraycopy, which used for System.arraycopy in case of -XX:-ReduceInitialCardMarks. Otherwise it does LibraryCallKit::copy_to_clone, which initialize new array in RAW memory (if -XX:+ReduceBulkZeroing, which enabled by default). In contrast System.arraycopy uses generate_arraycopy directly, try to check ReduceBulkZeroing (and many others cases) and eliminate array zeroing too, with mentioned additional checks and also it would make additional checks to make sure that all elements are initialized, unlike Object.clone. Finally, in best case both of them use generate_unchecked_arraycopy.

Below I show some benchmarks to see this effect on practice:

  1. First one is just simple benchmark, the only difference from previous answer, that arrays is not sorted; We see that arraycopy is slower (but not two times), results - https://pastebin.com/ny56Ag1z;
  2. Secondly, I add option -XX:-ReduceBulkZeroing and now I see that the performance of both methods is very closer. Results - https://pastebin.com/ZDAeQWwx;
  3. I also assume that we will have the difference between Old/Young, because of arrays alignment (it is a feature of Java GC, when we call GC, alignment of arrays is changed, it is easy to observe using JOL). I was surprised that performance become the same, generally, and downgrade for both methods. Results - https://pastebin.com/bTt5SJ8r. For whom who believes in concrete numbers, throughput of System.arraycopy is better then Object.clone.

First benchmark:

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopy {

    @Param({"10", "1000", "100000"})
    int size;

    int[] source;

    @Setup(Level.Invocation)
    public void setup() {
        source = create(size);
    }

    @Benchmark
    public int[] clone(CloneVsArraycopy cloneVsArraycopy) {
        return cloneVsArraycopy.source.clone();
    }

    @Benchmark
    public int[] arraycopy(CloneVsArraycopy cloneVsArraycopy) {
        int[] dest = new int[cloneVsArraycopy.size];
        System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
        return dest;
    }

    public static void main(String[] args) throws Exception {
        new Runner(new OptionsBuilder()
                .include(CloneVsArraycopy.class.getSimpleName())
                .warmupIterations(20)
                .measurementIterations(20)
                .forks(20)
                .build()).run();
    }

    private static int[] create(int size) {
        int[] a = new int[size];
        for (int i = 0; i < a.length; i++) {
            a[i] = ThreadLocalRandom.current().nextInt();
        }
        return a;
    }

}

Running this test on my PC, I got this - https://pastebin.com/ny56Ag1z. The difference is not so big, but still exists.

The second benchmark I only add one setting -XX:-ReduceBulkZeroing and got this results https://pastebin.com/ZDAeQWwx. No we see that for Young Gen the difference is much less too.

In third benchmark I changed only setup method and enable ReduceBulkZeroing option back:

@Setup(Level.Invocation)
public void setup() {
    source = create(size);
    // try to move to old gen/align array
    for (int i = 0; i < 10; ++i) {
        System.gc();
    }
}

The difference is much less (maybe in error interval) - https://pastebin.com/bTt5SJ8r.

Disclaimer

It is also could be wrong. You should check on your own.

In addition

I think, it is interesting to look on benchmarks process:

# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.arraycopy
# Parameters: (size = 50000)

# Run progress: 0,00% complete, ETA 00:07:30
# Fork: 1 of 5
# Warmup Iteration   1: 8,870 ops/ms
# Warmup Iteration   2: 10,912 ops/ms
# Warmup Iteration   3: 16,417 ops/ms <- Hooray!
# Warmup Iteration   4: 17,924 ops/ms <- Hooray!
# Warmup Iteration   5: 17,321 ops/ms <- Hooray!
# Warmup Iteration   6: 16,628 ops/ms <- What!
# Warmup Iteration   7: 14,286 ops/ms <- No, stop, why!
# Warmup Iteration   8: 13,928 ops/ms <- Are you kidding me?
# Warmup Iteration   9: 13,337 ops/ms <- pff
# Warmup Iteration  10: 13,499 ops/ms
Iteration   1: 13,873 ops/ms
Iteration   2: 16,177 ops/ms
Iteration   3: 14,265 ops/ms
Iteration   4: 13,338 ops/ms
Iteration   5: 15,496 ops/ms

For Object.clone

# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.clone
# Parameters: (size = 50000)

# Run progress: 0,00% complete, ETA 00:03:45
# Fork: 1 of 5
# Warmup Iteration   1: 8,761 ops/ms
# Warmup Iteration   2: 12,673 ops/ms
# Warmup Iteration   3: 20,008 ops/ms
# Warmup Iteration   4: 20,340 ops/ms
# Warmup Iteration   5: 20,112 ops/ms
# Warmup Iteration   6: 20,061 ops/ms
# Warmup Iteration   7: 19,492 ops/ms
# Warmup Iteration   8: 18,862 ops/ms
# Warmup Iteration   9: 19,562 ops/ms
# Warmup Iteration  10: 18,786 ops/ms

We can observe perfomance downgrade here for System.arraycopy. I saw similar picture for Streams and there was a bug in compilers. I suppose it could be a bug in compilers too. Anyway, it is strange that after 3 warmup performance downgrades.

UPDATE

What is about typechecking

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopyObject {

    @Param({"100"})
    int size;

    AtomicLong[] source;

    @Setup(Level.Invocation)
    public void setup() {
        source = create(size);
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public AtomicLong[] clone(CloneVsArraycopyObject cloneVsArraycopy) {
        return cloneVsArraycopy.source.clone();
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public AtomicLong[] arraycopy(CloneVsArraycopyObject cloneVsArraycopy) {
        AtomicLong[] dest = new AtomicLong[cloneVsArraycopy.size];
        System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
        return dest;
    }

    public static void main(String[] args) throws Exception {
        new Runner(new OptionsBuilder()
                .include(CloneVsArraycopyObject.class.getSimpleName())
                .jvmArgs("-XX:+UnlockDiagnosticVMOptions", "-XX:+PrintInlining", "-XX:-ReduceBulkZeroing")
                .warmupIterations(10)
                .measurementIterations(5)
                .forks(5)
                .build())
                .run();
    }

    private static AtomicLong[] create(int size) {
        AtomicLong[] a = new AtomicLong[size];
        for (int i = 0; i < a.length; i++) {
            a[i] = new AtomicLong(ThreadLocalRandom.current().nextLong());
        }
        return a;
    }

}

Difference is not observed - https://pastebin.com/ufxCZVaC. I suppose an explanation is simple, as System.arraycopy is hot intrinsic in that case, the real implementation would be just inlined without any typecheking, etc.

Note

I agreed with Radiodef you could find interesting to read blog post, the author of this blog is the creator (or one of creators) of JMH.

egorlitvinenko
  • 2,736
  • 2
  • 16
  • 36
  • You are asssuming that the `arraycopy()` method mentioned in `ibraryCallKit::generate_arraycopy()` is `System.arraycopy()`, without proof, and without regard to any of the *caveats* expressed in the comments, which suggest almost conclusively that they aren't the same thing. – user207421 Sep 18 '17 at 10:31
  • 1
    Dear Egor, I am **not** a downvoter of your question, I believe that you did a great research about this problem. But it does not look really convincingly, especially these long source code pastes. I would rather focus on essential parts, but these all looks not like a result of the research, but the *story* about what you tried. It is not an answer. – Andremoniy Sep 18 '17 at 10:33
  • Yes, I assume, because there is comments and real implementation, which depends on OS/etc defined on lower level inside that method. I didn't see only on source code, I investigate hotpost branch, and confirmed this investigation with benchmarks. – egorlitvinenko Sep 18 '17 at 10:34
  • @EJP you could easily refute this, just try to find another use case. And if you try to look at Object.clone implementation you will see that it uses generate_unchecked_arraycopy - exactle the same method that System.arraycopy for some cases. – egorlitvinenko Sep 18 '17 at 10:36
  • @Andremoniy thank you for the feedback. Maybe I don't provide could explanation, and how I achieve this. – egorlitvinenko Sep 18 '17 at 10:38
  • As minimum you should certainly make code paste from the hotspot much more shorter, leaving only essential fragment of code (and providing the link to the full one if it is needed). – Andremoniy Sep 18 '17 at 10:39
  • @egorlitvinenko You can't use `System.arraycopy()` to copy an `Object[]` to a `String[]`, unless the source array only contains `Strings`: otherwise an `ArrayStoreException` is thrown; *ergo* it has to do type checking. – user207421 Sep 21 '17 at 03:05
3

As far as copy is concerned System.arrayCopy is fastest, then and now.

  • System.arrayCopy doesn't create a new array and can't be beaten in raw copy speed.
  • Arrays.copyOf simply creates an array and calls arrayCopy. Convenience.
  • Array.clone is highly efficient, but it need to flush copied data to all cpu cache.

If you can code in a way to reuse array with arrayCopy, go for it. Otherwise I personally recommend copyOf given the upward trend of cpu cores and because clone is considered old and problematic in general - the main point of the Josh Bloch blog that started this question.

Contrary to common thoughts, the actual copy loops (type checked or not) are not Java bytecode and is not optimizable by hotspot. The loops are coded in C++ and is low level jvm implementation.


Long answer:

This answer is based on and link to source code of OpenJDK 8, which as far as I know should be the same for Sun.

Array copy is perhaps more complicated than what most people think. On C code level, it can be split into three cases:

  1. Primitive arrays are copied directly with a copy loop.
  2. Object arrays of same class, or subclass to super class array, are also copied directly.
  3. Otherwise, between arrays of different classes, a type check is made on each element.

The absolute speed of copying an array will thus vary greatly depends on the array type. The relative speed of the three clone methods does not, however, since they all resolve to the same copy loop, an inlined C++ or assembly loop. Thus the different in speed is mainly caused by overheads and other factors.

  • System.arrayCopy is essentially type checks and length checks, then straight to the copy loop. In my own tests arrayCopy is always faster than the other two methods well beyond any margin of errors.

  • Arrays.copyOf simply calls System.arrayCopy - after creating a new array. Note that it does not call Array.clone. Contrary to Radiodef's comment, there is no indication that Java 8 will bypass the zero initialisation.

  • Array.clone is interesting. It directly calls heap allocation and copy loop, with minimal checks. Thus its array creation should be quicker than Arrays.copyOf, and its copy as fast as System.arrayCopy if not faster.

But in my tests Array.clone is slightly slower than copyOf.

I suspect that it's because of the memory barrier after the copy. Like a constructor, clone will make sure the copied data are visible to all threads - which neither System.arrayCopy nor Array.copyOf do. This means Array.clone need to spend time on waiting CPU cache to update.

If this is true, the result of Array.clone vs Arrays.copyOf depends on whether the cache flush of clone is faster than overheads of copyOf, and should be platform dependent.

Other than this, since cloning always result in an array of same type, all three methods ultimately use the same copy loop.

If you only want to copy, arrayCopy is always fastest, simply because it doesn't create a new array. For the rest, if the java mailing list is anything to go by, the choice between Arrays.copyOf and Array.clone should be largely a matter of taste.


My jmh test result and code below.

  • One way tests return copied array.
  • Two way tests overwrite copy source which force next clone to copy "new" data.
  • NoClone does not clone anything and is a yardstick to make sure higher is faster.

As stated, Clone and CopyOf is a close race and your mileage may vary.

/* # Run complete. Total time: 00:06:44

Benchmark                               Mode  Cnt          Score         Error  Units
MyBenchmark.ArrayCloneByteOneWay       thrpt   20    1048588.503 ±    2608.862  ops/s
MyBenchmark.ArrayCloneByteTwoWay       thrpt   20     523782.848 ±    1613.823  ops/s
MyBenchmark.ArrayCloneObjOneWay        thrpt   20     260903.006 ±    1311.827  ops/s
MyBenchmark.ArrayCloneObjTwoWay        thrpt   20     129448.639 ±    1179.122  ops/s
MyBenchmark.ArraysCopyOfByteOneWay     thrpt   20    1065995.804 ±    2197.919  ops/s
MyBenchmark.ArraysCopyOfByteTwoWay     thrpt   20     533025.610 ±    2831.955  ops/s
MyBenchmark.ArraysCopyOfObjOneWay      thrpt   20     266134.565 ±    1536.756  ops/s
MyBenchmark.ArraysCopyOfObjTwoWay      thrpt   20     130821.380 ±     274.325  ops/s
MyBenchmark.NoClone                    thrpt   20  308776528.157 ± 2546848.128  ops/s
MyBenchmark.SystemArrayCopyByteOneWay  thrpt   20    1232733.367 ±    8439.409  ops/s
MyBenchmark.SystemArrayCopyByteTwoWay  thrpt   20     859387.983 ±    1919.359  ops/s
MyBenchmark.SystemArrayCopyObjOneWay   thrpt   20     239532.442 ±     775.193  ops/s
MyBenchmark.SystemArrayCopyObjTwoWay   thrpt   20     167235.661 ±     503.141  ops/s
*/

import java.util.Arrays;
import java.util.Random;
import org.openjdk.jmh.annotations.*;

@Fork(2) @Warmup(iterations = 5, time = 1) @Measurement(iterations = 10, time = 1)
public class Q46230557 {
   private static final int ARRAY_SIZE = 8192;

   @State(Scope.Thread) public static class Data {
      public byte[] bytes = new byte[ ARRAY_SIZE ];
      public Object[] objs = new Object[ ARRAY_SIZE ];
      @Setup public void setup() {
         final Random RNG = new Random();
         RNG.nextBytes( bytes );
         for ( int i = 0 ; i < ARRAY_SIZE ; i++ )
            objs[i] = RNG.nextInt();
      }
   }

   @Benchmark public byte[] NoClone( final Data data ) {
      return data.bytes;
   }

   @Benchmark public byte[] SystemArrayCopyByteOneWay( final Data data ) {
      final byte[] dest = new byte[ ARRAY_SIZE ];
      System.arraycopy( data.bytes, 0, dest, 0, ARRAY_SIZE );
      return dest;
   }

   @Benchmark public byte[] SystemArrayCopyByteTwoWay( final Data data ) {
      final byte[] buf = new byte[ ARRAY_SIZE ];
      System.arraycopy( data.bytes, 0, buf, 0, ARRAY_SIZE );
      System.arraycopy( buf, 0, data.bytes, 0, ARRAY_SIZE );
      return data.bytes;
   }

   @Benchmark public byte[] ArraysCopyOfByteOneWay( final Data data ) {
      return Arrays.copyOf( data.bytes, ARRAY_SIZE );
   }

   @Benchmark public byte[] ArraysCopyOfByteTwoWay( final Data data ) {
      final byte[] buf = Arrays.copyOf( data.bytes, ARRAY_SIZE );
      return data.bytes = Arrays.copyOf( buf, ARRAY_SIZE );
   }

   @Benchmark public byte[] ArrayCloneByteOneWay( final Data data ) {
      return data.bytes.clone();
   }

   @Benchmark public byte[] ArrayCloneByteTwoWay( final Data data ) {
      final byte[] buf = data.bytes.clone();
      return data.bytes = buf.clone();
   }

   @Benchmark public Object[] SystemArrayCopyObjOneWay( final Data data ) {
      final Object[] dest = new Object[ ARRAY_SIZE ];
      System.arraycopy( data.objs, 0, dest, 0, ARRAY_SIZE );
      return dest;
   }

   @Benchmark public Object[] SystemArrayCopyObjTwoWay( final Data data ) {
      final Object[] buf = new Object[ ARRAY_SIZE ];
      System.arraycopy( data.objs, 0, buf, 0, ARRAY_SIZE );
      System.arraycopy( buf, 0, data.objs, 0, ARRAY_SIZE );
      return data.objs;
   }

   @Benchmark public Object[] ArraysCopyOfObjOneWay( final Data data ) {
      return Arrays.copyOf( data.objs, ARRAY_SIZE );
   }

   @Benchmark public Object[] ArraysCopyOfObjTwoWay( final Data data ) {
      final Object[] buf = Arrays.copyOf( data.objs, ARRAY_SIZE );
      return data.objs = Arrays.copyOf( buf, ARRAY_SIZE );
   }

   @Benchmark public Object[] ArrayCloneObjOneWay( final Data data ) {
      return data.objs.clone();
   }

   @Benchmark public Object[] ArrayCloneObjTwoWay( final Data data ) {
      final Object[] buf = data.objs.clone();
      return data.objs = buf.clone();
   }
}
Sheepy
  • 17,324
  • 4
  • 45
  • 69
  • 2
    I am sorry: you probably should read about how to properly write microbenchmark tests. Your teat is improper. Besides that you said nothing about *why* essentially 'clone()' would be slowest. At least you even did not read the link in my answer. – Andremoniy Sep 15 '17 at 04:52
  • @Andremoniy Sorry for the wait. My baby has slept and I've completed the answer. I haven't had time to setup JMH, and I can only hope my test code have avoided common pitfalls like the one linked to in your link above, among others. My focus is on evaluating JDK source code, though, so that's where I spent most of my time on. Hope you'd be happy with that part. – Sheepy Sep 15 '17 at 16:02
  • I will upvote your answer only after you write a proper microbenchmark test – Andremoniy Sep 17 '17 at 12:55
  • 1
    @Andremoniy Sorry it took a while, but I've replaced the test with jmh and added my test results. Conclusion remains the same. – Sheepy Sep 21 '17 at 05:08
0

The difference in performance comes from skipping the step where the array is zeroed out.

public static int[] copyUsingArraycopy(int[] original)
{
    // Memory is allocated and zeroed out
    int[] copy = new int[original.Length];
    // Memory is copied
    System.arraycopy(original, 0, copy, 0, original.length);
}

public static int[] copyUsingClone(int[] original)
{
    // Memory is allocated, but not zeroed out
    // Unitialized memory is then copied into
    return (int[])original.clone();
}

However, in cases where the performance of copying an array makes a significant difference it is generally better to employ double buffering.

int[] backBuffer = new int[BUFFER_SIZE];
int[] frontBuffer = new int[BUFFER_SIZE];

...

// Swap buffers
int[] temp = frontBuffer;
frontBuffer = backBuffer;
backBuffer = temp;
System.arraycopy(frontBuffer, 0, backBuffer, 0, BUFFER_SIZE);
Paul Smith
  • 166
  • 3
  • 13
0

Not really agree with procrastinator answer. I don't know with which jdk you have launch your jmh tests but i do not have the same results.

For me, System.arraycopy is quicker than Clone().

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, batchSize = 1000)
@Measurement(iterations = 10, time = 1, batchSize = 1000)
public class ArrayCopyTest {

  @Param({"1000","100","10","5", "1"})
  private int size;
  private int[] original;
  private int[] dest;

  @Setup
  public void setup() {
    original = new int[size];
    for (int i = 0; i < size; i++) {
      original[i] = i;
    }
    dest = new int[size];
  }

  @Benchmark
  public int[] SystemArrayCopy() {
    final int length = size;
    int[] destination = new int[length];
    System.arraycopy(original, 0, destination, 0, length);
    return destination;
  }

  @Benchmark
  public int[] SystemArrayCopyCache() {
    System.arraycopy(original, 0, dest, 0, original.length);
    return dest;
  }

  @Benchmark
  public int[] arrayClone() {
    return original.clone();
  }

  public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder()
        .include(ArrayCopyTest.class.getSimpleName())
        .build();

    new Runner(opt).run();
  }
}

And the results obtain using a jdk8.1.0_121

Benchmark                               (size)   Mode  Cnt         Score        Error  Units
ops/s
ArrayCopyTest.SystemArrayCopy             1000  thrpt   10      1332,640 ±     79,860  ops/s
ArrayCopyTest.SystemArrayCopy              100  thrpt   10     11850,158 ±    617,639  ops/s
ArrayCopyTest.SystemArrayCopy               10  thrpt   10     50440,946 ±   1409,152  ops/s
ArrayCopyTest.SystemArrayCopy                5  thrpt   10     68791,250 ±   1538,610  ops/s
ArrayCopyTest.SystemArrayCopy                1  thrpt   10     95913,164 ±    671,765  ops/s
ArrayCopyTest.SystemArrayCopyCache        1000  thrpt   10     13514,812 ±    211,703  ops/s
ArrayCopyTest.SystemArrayCopyCache         100  thrpt   10     74976,673 ±   2026,528  ops/s
ArrayCopyTest.SystemArrayCopyCache          10  thrpt   10    108410,738 ±    576,100  ops/s
ArrayCopyTest.SystemArrayCopyCache           5  thrpt   10    118921,286 ±   1354,365  ops/s
ArrayCopyTest.SystemArrayCopyCache           1  thrpt   10    141092,949 ±   2872,961  ops/s
ArrayCopyTest.arrayClone                  1000  thrpt   10      1030,526 ±     40,950  ops/s
ArrayCopyTest.arrayClone                   100  thrpt   10      5233,746 ±    163,820  ops/s
ArrayCopyTest.arrayClone                    10  thrpt   10      8556,687 ±     77,213  ops/s
ArrayCopyTest.arrayClone                     5  thrpt   10      8895,238 ±    241,374  ops/s
ArrayCopyTest.arrayClone                     1  thrpt   10      9036,695 ±    243,890  ops/s

It's surprising but arraycopy is better than clone. I recommend using it and it is also really useful in big loop algorithms to reuse objects and avoid Garbage Collector pressure. With arraycopy you can reuse your array and avoid waallocation/deallocation time