I've had this question for quite a while now, trying to read lots of resources and understanding what is going on - but I've still failed to get a good understanding of why things are the way they are.
Simply put I'm trying to test how a CAS
would perform vs synchronized
in contended and not environments. I've put up this JMH
test:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class SandBox {
Object lock = new Object();
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(SandBox.class.getSimpleName())
.jvmArgs("-ea", "-Xms10g", "-Xmx10g")
.shouldFailOnError(true)
.build();
new Runner(opt).run();
}
@State(Scope.Thread)
public static class Holder {
private long number;
private AtomicLong atomicLong;
@Setup
public void setUp() {
number = ThreadLocalRandom.current().nextLong();
atomicLong = new AtomicLong(number);
}
}
@Fork(1)
@Benchmark
public long sync(Holder holder) {
long n = holder.number;
synchronized (lock) {
n = n * 123;
}
return n;
}
@Fork(1)
@Benchmark
public AtomicLong cas(Holder holder) {
AtomicLong al = holder.atomicLong;
al.updateAndGet(x -> x * 123);
return al;
}
private Object anotherLock = new Object();
private long anotherNumber = ThreadLocalRandom.current().nextLong();
private AtomicLong anotherAl = new AtomicLong(anotherNumber);
@Fork(1)
@Benchmark
public long syncShared() {
synchronized (anotherLock) {
anotherNumber = anotherNumber * 123;
}
return anotherNumber;
}
@Fork(1)
@Benchmark
public AtomicLong casShared() {
anotherAl.updateAndGet(x -> x * 123);
return anotherAl;
}
@Fork(value = 1, jvmArgsAppend = "-XX:-UseBiasedLocking")
@Benchmark
public long syncSharedNonBiased() {
synchronized (anotherLock) {
anotherNumber = anotherNumber * 123;
}
return anotherNumber;
}
}
And the results:
Benchmark Mode Cnt Score Error Units
spinLockVsSynchronized.SandBox.cas avgt 5 212.922 ± 18.011 ns/op
spinLockVsSynchronized.SandBox.casShared avgt 5 4106.764 ± 1233.108 ns/op
spinLockVsSynchronized.SandBox.sync avgt 5 2869.664 ± 231.482 ns/op
spinLockVsSynchronized.SandBox.syncShared avgt 5 2414.177 ± 85.022 ns/op
spinLockVsSynchronized.SandBox.syncSharedNonBiased avgt 5 2696.102 ± 279.734 ns/op
In the non-shared case CAS
is by far faster, which I would expect. But in shared case, things are the other way around - and this I can't understand. I don't think this is related to biased locking, as that would happen after a threads holds the lock for 5 seconds (AFAIK) and this does not happen and the test is just proof of that.
I honestly hope it's just my tests that are wrong, and someone having jmh
expertise would come along and just point me to the wrong set-up here.