I noticed many times that small, trivial, seemingly unrelated code changes can alter the performance characteristics of a piece of Java code, sometimes dramatically.
This happens in both JMH and hand-rolled benchmarks.
For example, in a class like this:
class Class<T> implements Interface {
private final Type field;
Class(ClassBuilder builder) {
field = builder.getField();
}
@Override
void method() { /* ... */ }
}
I did this code change:
class Class<T> implements Interface {
private static Class<?> instance;
private final Type field;
Class(Builder builder) {
instance = this;
field = builder.getField();
}
@Override
void method() { /* ... */ }
}
and performance changed dramatically.
This is just one example. There are other cases where I noticed the same thing.
I cannot determine what causes this. I searched the web, but found no information.
To me, it looks totally uncontrollable. Maybe it has to do with how the compiled code is laid out in memory?
I do not think it is due to false sharing (see below).
I'm developing a spinlock:
class SpinLock {
@Contended // Add compiler option: --add-exports java.base/jdk.internal.vm.annotation=<module-name> (if project is not modular, <module-name> is 'ALL-UNNAMED')
private final AtomicBoolean state = new AtomicBoolean();
void lock() {
while (state.getAcquireAndSetPlain(true)) {
while (state.getPlain()) { // With x86 PAUSE we avoid opaque load
Thread.onSpinWait();
}
}
}
void unlock() {
state.setRelease(false);
}
}
class AtomicBoolean {
private static final VarHandle VALUE;
static {
try {
VALUE = MethodHandles.lookup().findVarHandle(AtomicBoolean.class, "value", boolean.class);
} catch (ReflectiveOperationException e) {
throw new ExceptionInInitializerError(e);
}
}
private boolean value;
public boolean getPlain() {
return value;
}
public boolean getAcquireAndSetPlain(boolean value) {
return (boolean) VALUE.getAndSetAcquire(this, value);
}
public void setRelease(boolean value) {
VALUE.setRelease(this, value);
}
}
My hand-rolled benchmark reported 171.26ns ± 43%
and a JMH benchmark reported avgt 5 265.970 ± 27.712 ns/op
.
When I change it like this:
class SpinLock {
@Contended
private final AtomicBoolean state = new AtomicBoolean();
private final NoopBusyWaitStrategy busyWaitStrategy;
SpinLock() {
this(new NoopBusyWaitStrategy());
}
SpinLock(NoopBusyWaitStrategy busyWaitStrategy) {
this.busyWaitStrategy = busyWaitStrategy;
}
void lock() {
while (state.getAcquireAndSetPlain(true)) {
busyWaitStrategy.reset(); // Will be inlined
while (state.getPlain()) {
Thread.onSpinWait();
busyWaitStrategy.tick(); // Will be inlined
}
}
}
void unlock() {
state.setRelease(false);
}
}
class NoopBusyWaitStrategy {
void reset() {}
void tick() {}
}
My hand-rolled benchmark reported 184.24ns ± 48%
and a JMH benchmark reported avgt 5 291.285 ± 20.860 ns/op
.
Even though the results of the two benchmarks are different, they both increase.
This is the JMH benchmark:
public class SpinLockBenchmark {
@State(Scope.Benchmark)
public static class BenchmarkState {
final SpinLock lock = new SpinLock();
}
@Benchmark
@Fork(value = 1, warmups = 1, jvmArgsAppend = {"-Xms8g", "-Xmx8g", "-XX:+AlwaysPreTouch", "-XX:+UnlockExperimentalVMOptions", "-XX:+UseEpsilonGC", "-XX:-RestrictContended"})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Threads(6)
public void run(BenchmarkState state) {
state.lock.lock();
state.lock.unlock();
}
}
Do you have any ideas?
Does it happen with languages without a runtime, too?