11

I know I can simply iterate from start to end and clear those cells but I was wondering if it was possible in any faster way (perhaps using JNI-ed System.arrayCopy)?

pathikrit
  • 32,469
  • 37
  • 142
  • 221
  • What do you mean by faster (to code?) see [Arrays.fill](https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#fill(boolean[],%20boolean)) – Scary Wombat Feb 09 '17 at 06:53
  • [`Arrays.fill`](https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html#fill-java.lang.Object:A-int-int-java.lang.Object-) – 4castle Feb 09 '17 at 06:53
  • If SPEED is what your after consider ByteBuffer data = ByteBuffer.allocateDirect(frame.getData().length); See http://stackoverflow.com/questions/41825305/how-to-debug-segv-accerr – Jon Goodwin Feb 09 '17 at 07:11
  • @JonGoodwin It [also says](https://developer.android.com/training/articles/perf-jni.html#faq_sharing): " Depending on how direct byte buffer access is implemented, accessing the data from managed code can be very slow. " – Patrick Parker Feb 09 '17 at 07:17
  • @ PatrickParker Yes that was my point don't use "managed code" (java heap GC managed) use the native heap. The JVM has a big say in how it works, you have to suck it and see if it works. – Jon Goodwin Feb 09 '17 at 07:30
  • 1
    Using `Unsafe` probably. – Kayaman Feb 09 '17 at 08:08
  • @ScaryWombat: No I mean fast in speed in general for large arrays. I have choices to use a simple loop (or Array.fill util), or use System.arrayCopy to map in from a pre-allocated null array or use something else (Unsafe or ByteBuffers). – pathikrit Feb 10 '17 at 15:09
  • If filling an array with `null`s is the most performance relevant operation of your application, you should rethink your software design… – Holger Feb 15 '17 at 18:04
  • @Holder: Not sure what you mean. Here is the reason I asked this question - I was tasked with profiling a common data structure we use internally (a resizable circular buffer similar to ArrayDeque but supports random indexing). A lot of the usage of this data structure are removals of range from the middle. I do not have control of these usages. These large removals of range require setting of null to certain contagious regions of the underlying array. I was wondering if I could speed it up by using some native methods (like System.arrayCopy) instead of a vanilla for-loop. – pathikrit Feb 15 '17 at 18:46
  • I am 95% sure that if you correctly interpret the profiling results, you will find out that the problem is not that filling a certain amount of memory with zeroes is the bottleneck of your application, but the implementation and/or usage of your home-brew data structure as such or maybe something completely different. Creating a local optimum in a value stream outside the actual bottleneck (constraint) will increase throughput by exactly zero. Find the bottleneck and work on it.Try to think Java, not C. Sorry for my French. – kriegaex Feb 15 '17 at 22:28
  • @kriegaex: You are quite right, that this is not the bottleneck of my application but the slowest part of this data structure I am writing. – pathikrit Feb 16 '17 at 16:19
  • I understand that you are ambitious enough to optimise your local piece of work, but remember this simple truth: If your code is not the bottleneck (BN), then if the BN is upstream, you will just wait faster and longer for the next piece of incoming work. If the BN is downstream, you will just create a longer queue of work for the BN. Overall throughput win in both cases is 0%. Good luck anyway. – kriegaex Feb 17 '17 at 01:39
  • @kriegaex: Talking about BN is off-topic for this question IMO. If we find that doing arrayCopy is >3x faster than a for-loop then why should not I do that? Just saying that since this not the bottleneck in code and does not deserve to be optimized seems non-sensical to me. Also, irrespective of my own particular use-case, it is still a question worth knowing the answer to. – pathikrit Feb 17 '17 at 14:54
  • I agree that it is a bit off-topic, but you indirectly triggered my comments by your remark: "I was tasked with profiling a common data structure we use internally..." For me it means somebody in your organisation thinks it will speed up the overall application, why else would he assign you that task? What I meant to say is that your low-level optimisation costs money but does not save any as long as you do not optimise a bottleneck. Maybe your precious work is invested better elsewhere. The question is still interesting technically, but eventually developers implement business requirements. – kriegaex Feb 17 '17 at 17:02

1 Answers1

17

If I got it right, you need to nullify an array, or a sub-range of an array containing references to objects to make them eligible for GC. And you have a regular Java array, which stores data on-heap.

Answering your question, System.arrayCopy is the fastest way to null a sub-range of an array. It is worse memory-wise than Arrays.fill though, since you would have to allocate twice as much memory to hold references at worst case for an array of nulls you can copy from. Though if you need to fully null an array, even faster would be just to create a new empty array (e.g. new Object[desiredLength]) and replace the one you want to nullify with it.

Unsafe, DirectByteBuffer, DirectLongBuffer implementations doesn't provide any performance gain in a naive straight-forward implementation (i.e. if you just replace the Array with DirectByteBuffer or Unsafe). They are slower then bulk System.arrayCopy as well. Since those implementations have nothing to do with Java Array, they're out of scope of your question anyway.

Here's my JMH benchmark (full benchmark code available via gist) snippet for those including unsafe.setMemory case as per @apangin comment; and including ByteBuffer.put(long[] src, int srcOffset, int longCount) as per @jan-chaefer; and an equivalent of Arrays.fill loop as per @scott-carey to check if Arrays.fill could be an intrinsic in JDK 8.

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayFill() {
    Arrays.fill(objectHolderForFill, null);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayFillManualLoop() {
    for (int i = 0, len = objectHolderForFill.length; i < len; i++) {
        objectHolderForLoop[i] = null;
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void arrayCopy() {
    System.arraycopy(nullsArray, 0, objectHolderForArrayCopy, 0,
                              objectHolderForArrayCopy.length);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferManualLoop() {
    while (referenceHolderByteBuffer.hasRemaining()) {
        referenceHolderByteBuffer.putLong(0);
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directByteBufferBatch() {
    referenceHolderByteBuffer.put(nullBytes, 0, nullBytes.length);
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferManualLoop() {
    while (referenceHolderLongBuffer.hasRemaining()) {
        referenceHolderLongBuffer.put(0L);
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void directLongBufferBatch() {
    referenceHolderLongBuffer.put(nullLongs, 0, nullLongs.length);
}


@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArrayManualLoop() {
    long addr = referenceHolderUnsafe;
    long pos = 0;
    for (int i = 0; i < size; i++) {
        unsafe.putLong(addr + pos, 0L);
        pos += 1 << 3;
    }
}

@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void unsafeArraySetMemory() {
    unsafe.setMemory(referenceHolderUnsafe, size*8, (byte) 0);
}

Here's what I got (Java 1.8, JMH 1.13, Core i3-6100U 2.30 GHz, Win10):

100 elements
Benchmark                                       Mode      Cnt   Score   Error    Units
ArrayNullFillBench.arrayCopy                   sample  5234029  39,518 ± 0,991   ns/op
ArrayNullFillBench.directByteBufferBatch       sample  6271334  43,646 ± 1,523   ns/op
ArrayNullFillBench.directLongBufferBatch       sample  4615974  45,252 ± 2,352   ns/op
ArrayNullFillBench.arrayFill                   sample  4745406  76,997 ± 3,547   ns/op
ArrayNullFillBench.arrayFillManualLoop         sample  5549216  78,677 ± 13,013  ns/op
ArrayNullFillBench.unsafeArrayManualLoop       sample  5980381  78,811 ± 2,870   ns/op
ArrayNullFillBench.unsafeArraySetMemory        sample  5985884  85,062 ± 2,096   ns/op
ArrayNullFillBench.directLongBufferManualLoop  sample  4697023  116,242 ±  2,579  ns/op <-- wow
ArrayNullFillBench.directByteBufferManualLoop  sample  7504629  208,440 ± 10,651  ns/op <-- wow

I skipped all** the loop implementations from further tests
** - except arrayFill and arrayFillManualLoop for scale

1000 elements
Benchmark                                 Mode      Cnt    Score   Error    Units
ArrayNullFillBench.arrayCopy              sample  6780681  184,516 ± 14,036  ns/op
ArrayNullFillBench.directLongBufferBatch  sample  4018778  293,325 ± 4,074   ns/op
ArrayNullFillBench.directByteBufferBatch  sample  4063969  313,171 ± 4,861   ns/op
ArrayNullFillBench.arrayFillManualLoop    sample  6270397  543,801 ± 20,325  ns/op
ArrayNullFillBench.arrayFill              sample  6590416  548,250 ± 13,475  ns/op

10000 elements
Benchmark                                 Mode      Cnt     Score   Error    Units
ArrayNullFillBench.arrayCopy              sample  2551851  2024,543 ± 12,533  ns/op
ArrayNullFillBench.directLongBufferBatch  sample  2958517  4469,210 ± 10,376  ns/op
ArrayNullFillBench.directByteBufferBatch  sample  2892258  4526,945 ± 33,443  ns/op
ArrayNullFillBench.arrayFill              sample  2578580  5532,063 ± 20,705  ns/op
ArrayNullFillBench.arrayFillManualLoop    sample  2562569  5550,195 ± 40,666  ns/op

P.S. Speaking of ByteBuffer and Unsafe - their main benefits in your case is that they store data off-heap, and you can implement your own memory deallocation alghorithm which would siut your data-structure better than regular GC. So you won't need to nullify them, and could compact memory as you please. Most likely the efforts won't worth much, since it would be much easier to get a less performant and more error-prone code then you have now.

bashnesnos
  • 816
  • 6
  • 16
  • Why using Unsafe in a manual loop, but not `unsafe.setMemory`? – apangin Feb 16 '17 at 18:13
  • Thanks @apangin, I've added this case as well. I'm not an Unsafe pro-user, just missed that possibility. I've expected better results than a manual loop though, do you think it's worth a separate stackoverflow question (including those strange results for ByteBuffer)? – bashnesnos Feb 17 '17 at 12:16
  • This might be a benchmarking issue, becase I did saw better results for unsafe.setMemory in my own measurements. I'll take a look later. – apangin Feb 17 '17 at 12:49
  • @apangin I've created a separate question, if that suits you better you could answer there http://stackoverflow.com/questions/42299184/why-direct-memory-array-is-slower-to-clear-than-a-usual-java-array. – bashnesnos Feb 17 '17 at 13:30
  • 1
    A missing test is the simple loop in java. loop over the array with a basic for or while loop and set the value to null. The JVM is rather good at optimizing that sort of loop. Arrays.fill may not be the same; the JVM might 'intrinsify' it. – Scott Carey Mar 29 '17 at 20:26
  • @ScottCarey it could be, though it's not in the list of intrinsics for OpenJDK 8 as per the http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/classfile/vmSymbols.hpp#l581 (the details are coming from this answer: http://stackoverflow.com/a/19892404/2176572). I've added this case thanks. It seems that it's quite the same as `Arrays.fill` on all the ranges - so it's an intrinsic after all. – bashnesnos Mar 30 '17 at 12:59
  • @bashnesnos I have seen several cases where a simple loop produces optimal assembly, sometimes moreso than System.arraycopy. It is highly dependend on what the JVM is able to infer however. For example, an equivalent to the arraycopy variant with a loop over both arrays -- I have seen that turn into a single REP MOVS assembly instruction for the loop in some cases, but not in others. – Scott Carey Aug 03 '17 at 06:41