11

In the "95% of performance is about clean representative models" talk by Martin Thompson, between 17 and 21 minute, such code is presented:

public class Queue
{
    private final Object[] buffer;
    private final int capacity;

    // Rest of the code

}

In 20:16 he says:

You can get much better performance, so leaving things like capacity in there is the right thing to do.

I tried to come up with a code example in which capacity will be much faster than buffer.length, but I failed.

Martin is saying that problems arise in two scenerios:

  1. In a concurrent world. But, length field is also final, JLS 10.7. So, I don't see how this could be a problem.
  2. When a cache misses. I tried calling capacity vs buffer.length one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking.

Could you please provide a code example, which demonstrates a case in which capacity is superior to buffer.length in terms of performance?

The more common case (frequently spotted in the real code), the better.

Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.

Community
  • 1
  • 1
Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
  • I thought capacity meant how many elements could be added to a collection before reallocation occurs. A collection can have a capacity of 5, but have a size of 0 cause no items are in it. As you add items to the collection the size will increase, but then the size exceeds the capacity then more memory will be allocated to be able to hold the new item. – Shar1er80 May 31 '15 at 23:41
  • 2
    I would assume that `.length` is faster on a modern JVM; JITs might have special ways of optimizing arrays and accesses to them that make `.length` favorable over an external field. What might be especially important is that the JIT would, in a tight loop using the array, put the length of that array in a register. Could you try a benchmark where you use `capacity` and `buffer.length` in a tight loop that accesses the array? – nanofarad May 31 '15 at 23:41
  • @hexafraction [Tried](http://pastebin.com/1LbdS2tD) `for (int i = 0; i < queue.getCapacity(); i++) queue.buffer[i] = queue.getCapacity();`, no significant difference between the throughput score. – Adam Stelmaszczyk Jun 01 '15 at 00:00
  • In a typical application, this is premature optimization that doesn't matter. However I doubt he's talking about typical applications. The tight loop test is not a good representation of a realistic application with complex code flow that requires extreme performance. JVM could even reduce your tight loops to no-op if it were more aggressive. – ZhongYu Jun 01 '15 at 00:02
  • 1
    @mjpt777 can you explain why this supposedly is faster? – Has QUIT--Anony-Mousse Jun 01 '15 at 06:02

4 Answers4

9

When you access array normally, JVM uses its length anyway to perform bounds check. But when you access array via sun.misc.Unsafe (like Martin does), you don't have to pay this implicit penalty.

Array's length field usually lies in the same cache line as its first elements, so you will have false sharing when multiple threads write into first indices concurrently. Using the separate field for buffer capacity will break this false sharing.

Here is a benchmark that shows how capacity field makes an array access substantially faster:

package bench;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import sun.misc.Unsafe;

import java.lang.reflect.Field;
import java.util.concurrent.atomic.AtomicReferenceArray;

@State(Scope.Benchmark)
@Threads(4)
public class Queue {
    private static final Unsafe unsafe = getUnsafe();
    private static final long base = unsafe.arrayBaseOffset(Object[].class);
    private static final int scale = unsafe.arrayIndexScale(Object[].class);

    private AtomicReferenceArray<Object> atomic;
    private Object[] buffer;
    private int capacity;

    @Param({"0", "25"})
    private volatile int index;

    @Setup
    public void setup() {
        capacity = 32;
        buffer = new Object[capacity];
        atomic = new AtomicReferenceArray<>(capacity);
    }

    @Benchmark
    public void atomicArray() {
        atomic.set(index, "payload");
    }

    @Benchmark
    public void unsafeArrayLength() {
        int index = this.index;
        if (index < 0 || index >= buffer.length) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    @Benchmark
    public void unsafeCapacityField() {
        int index = this.index;
        if (index < 0 || index >= capacity) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    private static Unsafe getUnsafe() {
        try {
            Field f = Unsafe.class.getDeclaredField("theUnsafe");
            f.setAccessible(true);
            return (Unsafe) f.get(null);
        } catch (IllegalAccessException | NoSuchFieldException e) {
            throw new AssertionError("Should not happen");
        }
    }
}

Results:

Benchmark                  (index)   Mode  Cnt      Score      Error   Units
Queue.atomicArray                0  thrpt    5  41804,825 ±  928,882  ops/ms
Queue.atomicArray               25  thrpt    5  84713,201 ± 1067,911  ops/ms
Queue.unsafeArrayLength          0  thrpt    5  48656,296 ±  676,166  ops/ms
Queue.unsafeArrayLength         25  thrpt    5  88812,863 ± 1089,380  ops/ms
Queue.unsafeCapacityField        0  thrpt    5  88904,433 ±  360,936  ops/ms
Queue.unsafeCapacityField       25  thrpt    5  88633,490 ± 1426,329  ops/ms
apangin
  • 92,924
  • 10
  • 193
  • 247
  • 3
    It would be great, if you add an ordinary array access, not using `Unsafe`, as third variant, for comparison. – Holger Jun 01 '15 at 12:13
  • @Holger A good point. However, it would not be fair to measure a plain array access since it does not provide inter-thread visibility semantics. So I added the test case for `AtomicReferenceArray` access that works nearly the same as `unsafeArrayLength` or even slightly slower. – apangin Jun 02 '15 at 00:07
  • 1
    Why is index 0 so much slower? And why is `unsafeArrayLength` faster than `unsafeCapacityField` for index 25? – Has QUIT--Anony-Mousse Jun 04 '15 at 10:47
  • 1
    @Anony-Mousse Accessing index 0 is slower because it resides next to `length` field of the array within the size of the cache line (64 bytes), hence the contention on the `length` field due to false sharing: writing to index 0 invalidates not only this piece of memory, but the whole cache line including the `length`. – apangin Jun 04 '15 at 11:18
  • 1
    As to index 25, `unsafeArrayLength` is not faster. The score difference is basically within the measurement accuracy. – apangin Jun 04 '15 at 11:20
3

You shouldn't percieve Martin's words directly. When he said "Using array.length is an anti-pattern that is copied over projects", I think it's slyness.

Using the capacity field indeed allows to improve locality, pollutes caches less and helps to avoid false sharing, but it requires to write really horrible source code, that is very far from being "clean and simple", Martin advertised in this talk.

The problem is, even you don't write array.length in your source directly, the JVM anyway access the length (that means, accesses the array header) on each array indexing array[i], to check bounds. Hotspot JVM has issues with eliminating bounds checks even in "simple" looping cases, and I think it is not able to interpret some "external" checks like if (i < capacity) return array[i]; as the bound check, i. e. bind the capacity field and the array size.

That is why, to make capacity-pattern making any sense, you need to access the array only via Unsafe! That, unfortunately, disables many bulk loop optimizations.

Look at Martin's "clean" queue implementation :)

I also could try to explain what was meant under concurrent considerations when accessing "final" array.length. My experiments show that even "read-read" concurrent cache line access introduce some sort of "false sharing" and slowing the things down. (I think JVM engineers considered this when made @sun.misc.Contended to offset 128 bytes from both sides of the contended fields; probably this is to ensure both two-side cache line prefetch and "read-read false sharing" won't affect the performance.)

That is why, when queue consumers and producers access the capacity to wrap around the buffer, they better access different objects, containing the same (by value) capacity field, and a reference to the same array. Accessing this array via unsafe producers and compumers normally access different areas of that array, don't sharing anything falsely.

IMO the antipattern now is to try implementing another Queue, while people behind https://github.com/JCTools/JCTools (including Martin, btw) optimize this to death.

Community
  • 1
  • 1
leventov
  • 14,760
  • 11
  • 69
  • 98
0

I'm no JVM expert and do not claim to understand its optimisation.

Have you considered looking at the byte code to see what instructions are executed?

public class Queue {

    private final Object[] buffer;
    private final int capacity;

    public Queue(int size) {
        buffer = new Object[size];
        this.capacity =  size;
    }

    public static void main(String... args) {
        Queue q = new Queue(10);
        int c = q.capacity;
        int l = q.buffer.length;
    }
}

This is the disassembled bytecode for the main method above.

public static void main(java.lang.String...);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
    Code:
      stack=3, locals=4, args_size=1
         0: new           #5                  // class Queue
         3: dup
         4: bipush        10
         6: invokespecial #6                  // Method "<init>":(I)V
         9: astore_1

        10: aload_1
        11: getfield      #4                  // Field capacity:I
        14: istore_2

        15: aload_1
        16: getfield      #3                  // Field buffer:[Ljava/lang/Object;
        19: arraylength

        20: istore_3
        21: return

We see that both have the instruction getfield, however the array.length has an additional instruction arraylength

Looking at the jvm spec for arraylength

instructionIsTypeSafe(arraylength, Environment, _Offset, StackFrame,
                      NextStackFrame, ExceptionStackFrame) :- 
    nth1OperandStackIs(1, StackFrame, ArrayType),
    arrayComponentType(ArrayType, _),
    validTypeTransition(Environment, [top], int, StackFrame, NextStackFrame),
    exceptionStackFrame(StackFrame, ExceptionStackFrame).

nth1OperandStackIs - This instruction checks that the incoming is a reference type and references an array. If the array reference is null, throws a NullPointerException

arrayComponentType - Check the type of elements. The component type of an array of X is X

validTypeTransition - Type checking rules

So, calling length on an array has the additional instruction arraylength. Very interested to know more about this question.

Si.
  • 81
  • 6
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Achrome Jun 01 '15 at 03:23
  • @Achrome This is not (and never was) a link-only answer. – user207421 Jun 01 '15 at 04:44
  • What do you mean by instrumenting the byte code? Could you please provide a code example? This is the only thing I'm asking in the question. I think your question should be a comment, it's not a definite answer. I looked at the byte code, but it didn't help me answering my question. We all know JLS 10.7, no need for a link, because I already mentioned it in the question. – Adam Stelmaszczyk Jun 01 '15 at 10:15
0

I doubt this will have any positive performance impact. It won't help with eliminating bound checks in Hotspot, for example. Even worse: it may be faster in one JVM, but maybe in the next version it hurts. Java keeps on getting additional optimizations, and array boundary checks are one thing they try hard to optimize...

I believe this may be a leftover from rewriting real Queue code to create a simpler example. Because in a real queue, you will need to take care of the used capacity, and sometimes you want to allow an upper bound on the capacity (to block producers when consumers cannot keep up). If you had such code (with setCapacity/getCapacity, and non-final capacity) and simplify it by removing resize logic and finalizing the backing storage, this is what you may end up with.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194