Is Java's System.arraycopy() efficient for small arrays?

Question

Is Java's System.arraycopy() efficient for small arrays, or does the fact that it's a native method make it likely to be substantially less efficient than a simple loop and a function call?

Do native methods incur additional performance overhead for crossing some kind of Java-system bridge?

I think that built-in native code is not affected by JNI latencies — gd1, Dec 15 '11 at 21:48
@glowcoder, i wanna see a micro benchmark of yours, that's exactly a very hard one to get right — bestsss, Dec 19 '11 at 11:42
to answer: there is a certain benchmark that does tons of small (less than a cache line) copies , all JDK implementors know about it and hopefully target the case. — bestsss, Dec 19 '11 at 11:54
Java bug regarding this issue: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4809502 — kevinarpe, Dec 04 '15 at 14:15

score 28 · Accepted Answer · edited Dec 15 '11 at 21:52

28

Expanding a little on what Sid has written, it's very likely that System.arraycopy is just a JIT intrinsic; meaning that when code calls System.arraycopy, it will most probably be calling a JIT-specific implementation (once the JIT tags System.arraycopy as being "hot") that is not executed through the JNI interface, so it doesn't incur the normal overhead of native methods.

In general, executing native methods does have some overhead (going through the JNI interface, also some internal JVM operations cannot happen when native methods are being executed). But it's not because a method is marked as "native" that you're actually executing it using JNI. The JIT can do some crazy things.

Easiest way to check is, as has been suggested, writing a small benchmark, being careful with the normal caveats of Java microbenchmarks (warm up the code first, avoid code with no side-effects since the JIT just optimizes it as a no-op, etc).

edited Dec 15 '11 at 21:52

Bhesh Gurung

50,430
22
93
142

answered Dec 15 '11 at 21:46

vanza

9,715
2
31
34

@vanze, small benchmarks running entirely in L1 cache are just wrong. – bestsss Dec 19 '11 at 11:50
2

@bestsss: He may have meant small in terms of the amount of code, not necessarily small in terms of the numbers of arrays I'd do the copy on. – Gravity Dec 19 '11 at 20:09
@Gravity, most people almost always get the micro benchmarks wrong, esp Java ones. basically you need to know what kind of code the JIT emits to write a proper micro-benchmark. Usually the easiest way is looking into the generated assembler and compare the result vs. expected. – bestsss Dec 19 '11 at 21:49

score 24 · Answer 2 · answered Sep 11 '12 at 09:47

Here is my benchmark code:

public void test(int copySize, int copyCount, int testRep) {
    System.out.println("Copy size = " + copySize);
    System.out.println("Copy count = " + copyCount);
    System.out.println();
    for (int i = testRep; i > 0; --i) {
        copy(copySize, copyCount);
        loop(copySize, copyCount);
    }
    System.out.println();
}

public void copy(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        System.arraycopy(src, 1, dst, 0, copySize);
        dst[copySize] = src[copySize] + 1;
        System.arraycopy(dst, 0, src, 0, copySize);
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Arraycopy: " + (end - begin) / 1e9 + " s");
}

public void loop(int copySize, int copyCount) {
    int[] src = newSrc(copySize + 1);
    int[] dst = new int[copySize + 1];
    long begin = System.nanoTime();
    for (int count = copyCount; count > 0; --count) {
        for (int i = copySize - 1; i >= 0; --i) {
            dst[i] = src[i + 1];
        }
        dst[copySize] = src[copySize] + 1;
        for (int i = copySize - 1; i >= 0; --i) {
            src[i] = dst[i];
        }
        src[copySize] = dst[copySize];
    }
    long end = System.nanoTime();
    System.out.println("Man. loop: " + (end - begin) / 1e9 + " s");
}

public int[] newSrc(int arraySize) {
    int[] src = new int[arraySize];
    for (int i = arraySize - 1; i >= 0; --i) {
        src[i] = i;
    }
    return src;
}

From my tests, calling test() with copyCount = 10000000 (1e7) or greater allows the warm-up to be achieved during the first copy/loop call, so using testRep = 5 is enough; With copyCount = 1000000 (1e6) the warm-up need at least 2 or 3 iterations so testRep shall be increased in order to obtain usable results.

With my configuration (CPU Intel Core 2 Duo E8500 @ 3.16GHz, Java SE 1.6.0_35-b10 and Eclipse 3.7.2) it appears from the benchmark that:

When copySize = 24, System.arraycopy() and the manual loop take almost the same time (sometimes one is very slightly faster than the other, other times it’s the contrary),
When copySize < 24, the manual loop is faster than System.arraycopy() (slightly faster with copySize = 23, really faster with copySize < 5),
When copySize > 24, System.arraycopy() is faster than the manual loop (slightly faster with copySize = 25, the ratio loop-time/arraycopy-time increasing as copySize increases).

Note: I’m not English native speaker, please excuse all my grammar/vocabulary errors.

I got different results. It does not seem to matter even with small size arrays. Large arrays into the thousands, systemarraycopy is much faster. small arrays I see no diff.. — RickHigh, Dec 30 '13 at 04:56
I had different results as well. With an array size of 5000 System.arrayCopy was about 4 times faster and with an array size of only 5 it was still about 20% faster. It seemed to even out at a size of 2. So I don't think there is ever a reason not to use System.arrayCopy — nikdeapen, Nov 06 '15 at 20:16

irreputable · Answer 3 · 2011-12-16T04:11:42.220

19

This is a valid concern. For example, in java.nio.DirectByteBuffer.put(byte[]), the author tries to avoid a JNI copy for small number of elements

// These numbers represent the point at which we have empirically
// determined that the average cost of a JNI call exceeds the expense
// of an element by element copy.  These numbers may change over time.
static final int JNI_COPY_TO_ARRAY_THRESHOLD   = 6;
static final int JNI_COPY_FROM_ARRAY_THRESHOLD = 6;

For System.arraycopy(), we can examine how JDK uses it. For example, in ArrayList, System.arraycopy() is always used, never "element by element copy", regardless of length (even if it's 0). Since ArrayList is very performance conscious, we can derive that System.arraycopy() is the most effecient way of array copying regardless of length.

edited Dec 16 '11 at 04:11

answered Dec 16 '11 at 03:48

irreputable

44,725
9
65
93

11

I suppose part of the question is whether `System.arraycopy()` goes through JNI at all. As someone pointed out, the mere fact that it's declared `native` doesn't mean anything because the JVM is allowed to have all kinds of special optimizations. – Gravity Dec 16 '11 at 05:32
I did think exactly the same thought regarding the `ArrayList` class, though, even as I was asking this question: that `ArrayList`, supposedly being a very well-optimized class, wouldn't do anything that's needlessly expensive. Still, maybe the authors figured the performance of small `ArrayLists` didn't matter all that much (since it's the operations on big datasets that are most costly anyway), or maybe it's not uniformly efficient across all JVMs. I'm going to have to benchmark to be convinced. – Gravity Dec 16 '11 at 05:39
2

they will care about performance of *many* small array lists. so each small array list must perform well. – irreputable Dec 16 '11 at 19:46
3

@Gravity, System.arraycopy **is not JNI**, it's intrnsics – bestsss Dec 19 '11 at 11:44
DirectBuffer copy is a special case, it doesn't use System.arraycopy at all nor JNI, the documentation is just old. It's intrinsics, Unsafe.copymemory is basically memmove/memcpy but simple copy can be just as fast if the range checks are eliminated. – bestsss Dec 19 '11 at 11:49
@bestsss: System.arraycopy() also checks type safety when copying non-primitive arrays. – Gravity Dec 19 '11 at 20:13
@Gravity, in 99.9% it doesn't since it copies Object[]. All collections have Object[] as backing storage and the cases are when copying into non-Object[], e.g String[]. String[] to String[] is not checked either. The JIT is smart enough not to perform the checks unless needed, most of the time checks are cheap enough if the pointer (reference) accommodates for the class as well. – bestsss Dec 19 '11 at 20:39

score 7 · Answer 4 · answered Jul 11 '13 at 05:25

7

System.arraycopy use a memmove operation for moving words and assembly for moving other primitive types in C behind the scene. So it makes its best effort to move as much as efficient it can reach.

answered Jul 11 '13 at 05:25

lichenbo

1,019
11
13

score 7 · Answer 5 · answered Mar 14 '18 at 04:32

Instead of relying on speculation and possibly outdated information, I ran some benchmarks using caliper. In fact, Caliper comes with some examples, including a CopyArrayBenchmark that measures exactly this question! All you have to do is run

mvn exec:java -Dexec.mainClass=com.google.caliper.runner.CaliperMain -Dexec.args=examples.CopyArrayBenchmark

My results are based on Oracle's Java HotSpot(TM) 64-Bit Server VM, 1.8.0_31-b13, running on a mid-2010 MacBook Pro (macOS 10.11.6 with an Intel Arrandale i7, 8 GiB RAM). I don't believe that it's useful to post the raw timing data. Rather, I'll summarize the conclusions with the supporting visualizations.

In summary:

Writing a manual for loop to copy each element into a newly instantiated array is never advantageous, even for arrays as short as 5 elements.
Arrays.copyOf(array, array.length) and array.clone() are both consistently fast. These two techniques are nearly identical in performance; which one you choose is a matter of taste.
System.arraycopy(src, 0, dest, 0, src.length) is almost as fast as Arrays.copyOf(array, array.length) and array.clone(), but not quite consistently so. (See the case for 50000 ints.) Because of that, and the verbosity of the call, I would recommend System.arraycopy() if you need fine control over which elements get copied where.

Here are the timing plots:

Sid Malani · Answer 6 · 2011-12-15T21:46:11.777

5

Byte codes are executed natively anyways so it's likely that performance would be better than a loop.

So in case of a loop it would have to execute byte codes which will incur an overhead. While array copy should be straight memcopy.

edited Dec 15 '11 at 21:46

answered Dec 15 '11 at 21:38

Sid Malani

2,078
1
13
13

2

There is an additional cost to crossing from Java to native code. – Andy Thomas Dec 15 '11 at 21:44
@AndyThomas-Cramer, there is no native code at all in terms of context switch to JNI. And side note: JIT is smart enough to optimize array copy via loop to the same code as `System.arraycopy` – bestsss Dec 19 '11 at 11:56
"JIT is smart enough to optimize array copy via loop to the same code as System.arraycopy". -- Is it? In theory that makes a lot of sense, but I haven't heard of such a thing. Can you point to a reference? – Gravity Dec 19 '11 at 20:14
All byte codes are run as machine code eventually anyway. This can be either interpreted or compiled by a JIT. So on that basis system.array copy is probably likely to be converted to memcpy. While a loop would probably be more expensive on that basis. – Sid Malani Dec 20 '11 at 20:07
@SidMalani: there's a huge difference between compiling the code to a JNI call and compiling it to a call to memcpy (in fact memmove), and while the second options gets you faster code, a human needs to add a special case to the JVM for each intrinsic. – Blaisorblade Feb 16 '14 at 14:31

score -1 · Answer 7 · answered Dec 15 '11 at 21:42

-1

Native functions should be faster than JVM functions, since there is no VM overhead. However for a lot of(>1000) very small(len<10) arrays it might be slower.

answered Dec 15 '11 at 21:42

laci37

510
4
17

Can you please ellaborate this ? Why would it be slower for a lot of small arrays ? – Costi Ciudatu Dec 15 '11 at 21:46
I'm guessing because of the function call overhead? Unless even native functions that are JIT intrinsic can be inlined... – Gravity Dec 15 '11 at 22:49
Yes the call overhead might be larger on native functions. As it adds up, it might outweigh the time saved by eliminating the JVM overhead – laci37 Dec 16 '11 at 13:14

Is Java's System.arraycopy() efficient for small arrays?

7 Answers7

Linked