93

I was surprised to see in the Java source that System.arraycopy is a native method.

Of course the reason is because it's faster. But what native tricks is the code able to employ that make it faster?

Why not just loop over the original array and copy each pointer to the new array - surely this isn't that slow and cumbersome?

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
James B
  • 3,692
  • 1
  • 25
  • 34

5 Answers5

87

In native code, it can be done with a single memcpy / memmove, as opposed to n distinct copy operations. The difference in performance is substantial.

Péter Török
  • 114,404
  • 31
  • 268
  • 329
  • @Peter, so within native code you can fettle with the Java memory model? (I've never had any cause to do any native malarkey) – James B May 05 '10 at 10:07
  • @James B, I am not an expert on this, but in native code you surely are not constrained with the Java memory model or anything - you can fiddle with the raw bits any way you like (at your own risk, of course). – Péter Török May 05 '10 at 10:15
  • 9
    Actually, only some subcases of `arraycopy` could be implemented using `memcpy` / `memmove`. Others require a runtime type check for each element copied. – Stephen C May 05 '10 at 10:48
  • 1
    @Stephen C, interesting - why is that? – Péter Török May 05 '10 at 12:44
  • 3
    @Péter Török - consider copying from an `Object[]` populated with `String` objects to a `String[]`. See last paragraph of http://java.sun.com/javase/6/docs/api/java/lang/System.html#arraycopy(java.lang.Object,%20int,%20java.lang.Object,%20int,%20int) – Stephen C May 05 '10 at 12:59
  • 3
    Peter, Object[] and byte[] + char[] are the most often copied ones, none of them requires an explicit type check. The compiler is smart enough NOT to check unless needed and virtually in 99.9% of the case it's not. Funny part is the small sized copies (less than a cache line) are quite dominant, so "memcpy" for small sized stuff being fast is truly important. – bestsss Dec 15 '11 at 02:58
  • So does it mean arraycopy runs in O(1) and not O(length)? – jainilvachhani Jan 22 '19 at 16:27
  • 1
    @jainilvachhani both `memcpy` and `memmove` are O(n), however cos of f.e. simd optimizations they are `few times` faster, so you may say they are O(n/x), where x is dependant on optimizations used in these functions – ufoq Dec 16 '20 at 13:38
16

It can't be written in Java. Native code is able to ignore or elide the difference between arrays of Object and arrays of primitives. Java can't do that, at least not efficiently.

And it can't be written with a single memcpy(), because of the semantics required by overlapping arrays.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • 5
    Fine, so `memmove` then. Although I don't think it makes much difference in context of this question. – Péter Török May 05 '10 at 10:19
  • Not memmove() either, see @Stephen C's comments on another answer. – user207421 Nov 06 '11 at 00:23
  • Saw that already, since that happened to be my own answer ;-) But thanks anyway. – Péter Török Nov 06 '11 at 20:34
  • @EJP what do you mean by *overlapping* arrays? – Geek Mar 18 '13 at 20:20
  • 1
    @Geek Arrays that overlap. If the source and target arrays and the same and only the offsets are different, the behaviour is carefully specified, and memcpy() does not comply. – user207421 Sep 01 '13 at 00:32
  • 1
    It *can't* be written in Java? Couldn't one write one generic method to handle subclasses of Object, and then one for each of the primitive types? – Michael Dorst Oct 03 '13 at 00:16
  • @anthropomorphic Do read what I actually wrote. 'At least not efficiently.' Such an implementation would need to make decisions about datatypes and then branch to one of the pieces of code you describe. – user207421 Oct 02 '15 at 23:21
  • @MichaelDorst Generics was introduced in JDK 1.5, the System.arrayCopy() method was introduced either in 1.0 or 1.1. – AwesomeHunter Nov 29 '20 at 14:08
12

It is, of course, implementation dependent.

HotSpot will treat it as an "intrinsic" and insert code at the call site. That is machine code, not slow old C code. This also means the problems with the signature of the method largely go away.

A simple copy loop is simple enough that obvious optimisations can be applied to it. For instance loop unrolling. Exactly what happens is again implementation dependent.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
  • 3
    this is a very decent answer :), esp. the mentioning the intrinsics. w/o them simple iteration might be faster since it's usually unrolled anyways by the JIT – bestsss Dec 15 '11 at 02:59
4

There are a few reasons:

  1. The JIT is unlikely to generate as efficient low level code as a manually written C code. Using low level C can enable a lot of optimizations that are close to impossible to do for a generic JIT compiler.

    See this link for some tricks and speed comparisons of hand written C implementations (memcpy, but the principle is the same): Check this Optimizing Memcpy improves speed

  2. The C version is pretty much independant of the type and size of the array members. It is not possible to do the same in java since there is no way to get the array contents as a raw block of memory (eg. pointer).

Community
  • 1
  • 1
Hrvoje Prgeša
  • 2,051
  • 5
  • 21
  • 36
  • 1
    Java code can get optimised. In fact what actually happens is machine code is generated which is more efficient than the C. – Tom Hawtin - tackline May 05 '10 at 10:19
  • 1
    I agree that sometimes JITed code will be better localy optimised since it knows which processor it is runnimg on. However, since it is "just in time" it will never be able to use all those non-local optimisations that take longer to execute. Also, it will never be able to match the hand crafted C code (which could also take the processor in account and partially negate the JIT advantages, either by compilng for a specific processor or by some kind of runtime check). – Hrvoje Prgeša May 05 '10 at 10:56
  • 1
    I think that the Sun JIT compiler team would dispute many of those points. For instance, I believe that HotSpot does global optimization to remove unnecessary method dispatching, and there's no reason why a JIT cannot generate processor specific code. Then there is the point that a JIT compiler can do branch optimization based on the execution behavior of the current application run. – Stephen C May 05 '10 at 11:12
  • @Stephen C - excelent point about the branch optimisations, aldough you could also perform static performance profiling with C/C++ compilers to achieve the similar effect. I also think that the hotspot has 2 modes of operation - desktop applications will not use all of the available optimizations to achieve a reasonable startup time, while the server applications will be optimized more aggressively. All in all, you get some advantages, but you also loose some. – Hrvoje Prgeša May 05 '10 at 13:43
  • 1
    System.arrayCopy is not implemented using C, which sort of invalidates this answer – Nitsan Wakart Oct 02 '15 at 21:27
  • @Nitsan Wakart: In theory the JIT can always redirect a method invocation to a specialy prepared one, bypassing the default java one. Not sure if this is/was ever used... – Hrvoje Prgeša Oct 05 '15 at 16:55
  • @HrvojePrgeša In practice OpenJDK and others have a platform dependant generated method which is written in assembly (or rather generated directly as assembly). There's no default implementation, this is a runtime method so expected to be implemented by the running JVM. – Nitsan Wakart Oct 06 '15 at 11:46
4

In my own tests System.arraycopy() for copying multiple dimension arrays is 10 to 20 times faster than interleaving for loops:

float[][] foo = mLoadMillionsOfPoints(); // result is a float[1200000][9]
float[][] fooCpy = new float[foo.length][foo[0].length];
long lTime = System.currentTimeMillis();
System.arraycopy(foo, 0, fooCpy, 0, foo.length);
System.out.println("native duration: " + (System.currentTimeMillis() - lTime) + " ms");
lTime = System.currentTimeMillis();

for (int i = 0; i < foo.length; i++)
{
    for (int j = 0; j < foo[0].length; j++)
    {
        fooCpy[i][j] = foo[i][j];
    }
}
System.out.println("System.arraycopy() duration: " + (System.currentTimeMillis() - lTime) + " ms");
for (int i = 0; i < foo.length; i++)
{
    for (int j = 0; j < foo[0].length; j++)
    {
        if (fooCpy[i][j] != foo[i][j])
        {
            System.err.println("ERROR at " + i + ", " + j);
        }
    }
}

This prints:

System.arraycopy() duration: 1 ms
loop duration: 16 ms
jumar
  • 5,360
  • 8
  • 46
  • 42
  • 11
    Even though this question is old, just for the record: This is NOT a fair benchmark (let alone the question if such a benchmark would make sense in the first place). `System.arraycopy` does a shallow copy (only the *references* to the inner `float[]`s are copied), whereas your nested `for`-loops performs a deep copy (`float` by `float`). A change to `fooCpy[i][j]` will be reflected in `foo` using `System.arraycopy`, but won't be using the nested `for`-loops. – misberner Jun 27 '13 at 03:48