Why not to use synchronized ArrayList even in single thread cases?

Question

I have run the following code to measure the time and performance difference between adding elements to ArrayList vs synchronized version of it, and surprisingly haven't find any significant difference! And by significant I mean the difference doesn't give you any information on which you can prefer one to another!

And my question is that if performance wise they are almost the same then why are they even providing the non-synchronized version in the first place? And why not to use the synchronized version for both multi-thread and single-thread cases?

FYI, as can be seen I didn't warmed up JVM, and also I understand that garbage collection will most probably run in between iterations to free up old arrays, but I don't think that matters since we have it for both cases and also even if you run it for just one iteration you would get the same result!

    int size = 100_000_000;
    long totalTime = 0;
    for(int j=0; j<20; j++) {
        List<Integer> l1 = new ArrayList<>(size);
        //List<Integer> l1 = Collections.synchronizedList(new ArrayList<>(size));

        long t1 = System.nanoTime();

        IntStream.range(0, size).sequential().forEach(i -> l1.add(i));

        long t2 = System.nanoTime();
        totalTime += t2-t1;
    }
    System.out.println("time (ms):" + TimeUnit.NANOSECONDS.toMillis(totalTime/20));

The simple answer is performance, `synchronisation` has a cost, even the [JavaDocs (for `Vector`)](https://docs.oracle.com/javase/10/docs/api/java/util/Vector.html) suggest using `ArrayList` where thread safety is not an issue - *"Vector is synchronized. If a thread-safe implementation is not needed, it is recommended to use ArrayList in place of Vector."* — MadProgrammer, Jan 14 '22 at 22:59
You might also find [Why are synchronize expensive in Java?](https://stackoverflow.com/questions/1671089/why-are-synchronize-expensive-in-java) useful — MadProgrammer, Jan 14 '22 at 23:00
Part of the issue is that `Collections.synchronizedList` does not create a new synchronized list, it simply wraps the existing `ArrayList` functions with a mutex. Try using `Vector` instead. It might be a bit faster. However, it will probably end up being a little slower. — Locke, Jan 14 '22 at 23:01
The JVM also uses a system of biasing locks to a single thread. This lets it pretend a synchronized object is not synchronized when it is only accessed from a single thread. However, this comes with the downside of being slower than a regular lock if it later needs to be accessed by another thread. This optimization is likely why you don't see a large difference in performance. — Locke, Jan 14 '22 at 23:04
@Locke I am using preview version of JDK19 and seems that biased locking has been disabled since jdk15 (https://quarkus.io/blog/biased-locking-help/). The interesting thing is that if I use 2 threads it wouldn't degrade the performance at all! — al pal, Jan 15 '22 at 00:09
@Locke I even tried it with HashMap. And with multiple threads(tried up to 7 threads on my 10 core Mac), the performance gets not worse but better as opposed to non-synchronized single thread case! — al pal, Jan 15 '22 at 00:30
@alpal "_I have run the following code to measure the time and performance difference between adding elements to ArrayList vs synchronized version of it, and surprisingly haven't find any significant difference!_" this is a very arrogant statement and kind of laughable considering that this has been analyzed and benchmarked by probably thousands of software engineers and all of them concluded the opposite. And you ended your statement with an exclamation point! LOL — hfontanez, Jan 15 '22 at 08:13

rzwitserloot · Answer 1 · 2022-01-17T15:21:46.693

synchronization, when it is actually used, really does cost. However, hotspot is pretty decent at realizing that a mutex isn't actually doing anything useful and eliminating it. That's what you're seeing.

So, why isn't ArrayList just synchronized out of the box / why isn't the advice 'use Vector, not ArrayList'? Many separate reasons:

Most important take-home reason (the rest is just historic peculiarity): Because a synchronized list is mostly useless. See below.
Modern day JVMs are pretty good at eliminating synchronized where it doesn't do anything. Which is why you are having a hard time using simple timing code to see any difference. But that wasn't always the case. ArrayList was introduced in java 1.2. Vector (a synchronized arraylist with a different API) is older than that: 1.0. ArrayList was introduced for 2 separate reasons: Partly to clean up that API, and partly because 'synchronize it!' was slow. NOW it is no longer slow, but Java 1.2 is 23 years old. Rerun your code on java 1.2 if you can find it anywhere and report back to me :)
Everything about Vector is deprecated, obsolete, and non-idiomatic. Part of that is simply 'because'. 23 years ago, the advice 'use ArrayList, not Vector' was correct for a bunch of reasons. Including "Because it is faster" (even if that is no longer true today). Now the reason to use ArrayList and not Vector is mostly: "Because ArrayList is what everybody is familiar with, Vector is not, when in rome act like romans, don't rock the boat for no reason whatsoever". This shows up in all sorts of pragmatic ways: The name 'Vector' is now being reused in the java ecosystem for something completely different (accessing hardware registers that aren't exactly 64-bit, part of Project Panama), for example.

Why is a synchronized list mostly useless?

a non-synchronized ('thread safe') implementation breaks completely; the spec says: Anything can happen. A synchronized ('thread safe') implementation does not break completely; instead, you get 1 of a permutation of options, with no guarantees whatsoever about which ones are more or less likely. That's not really more useful than utter chaos, though! For example, if I write this code:

List a = new Vector<String>();
Thread x = new Thread(() -> a.add("Hello"));
Thread y = new Thread(() -> a.add("World"));
x.start();
y.start();
x.join();
y.join();
System.out.println(a);

Then it is legal for this app to print [Hello, World], but it is also legal for this app to print [World, Hello]. There is no way to know, and a VM is free to always return the one, or always return the other, or flip a coin, or make it depend on the phase of the moon. Vector is synchronized and this is still useless to me. Nobody wants to write algorithms that need to deal with a combinatory explosion of permutations!!

With ArrayList, however, which is not 'thread safe', it gets much much worse. There are way more permutations here. The JVM can do any of these without breaking spec:

[Hello, World]
[World, Hello]
[Hello]
[World]
[null, Hello]
[World, World]
[]
[WhatTheHeckReally]
pause, play the macarena over the speaker system, then crash.

Anything goes - the spec says the behaviour is unspecified. In practice, the first 4 are all entirely possible.

Avoiding this mess is good, but the permutations that the synchronized Vector offers is just.. less bad. But still bad, so who cares? You want this code to be 100% reliable: You want code to do the same thing every time (unless I want randomness, but then use java.util.Random which has specs that explicitly spell out how it is random. Threads are free to be non-random, so if you MUST have randomness, you can't use that either).

In order to make stuff reliable, the operation needs to be either done by the object itself (you call ONE method and that is the only interaction your thread does with it), or, you need external locks.

For example, if I want put '1' in a hashmap for a key that isn't htere yet, and increment the number if it is, this code DOES NOT WORK:

Map<String, Integer> myMap = Collections.synchronizedMap(new HashMap<>());

...

String k = ...;
if (myMap.containsKey(k)) myMap.put(k, myMap.get(k) + 1);
else myMap.put(k, 1);

Seems fine? Nope, broken:

Thread 1 calls myMap.containsKey and sees the answer is false.
Thread 1 so happens to get pre-empted and freezes right there, after the if, before the put.
Thread 2 runs, and increments for the same key. It, too, finds myMap,containsKey returning false. It therefore runs myMap.put(k, 1).
Thread 1 continues running, and runs.. myMap.put(k, 1)
The map now contains k = 1, even though incrementFor(k) was run twice. Your app is broken.

See? Synchronization? It was completely useless here. What you want is either a lock:

synchronized (something) {
  String k = ...;
  if (myMap.containsKey(k)) myMap.put(k, myMap.get(k) + 1);
  else myMap.put(k, 1);
}

and this is completely fine - no matter how had you try running incrementFor(k) simultaneously, it'll dutifully count every invocation, or, better yet, we ask the map to do it for us, to have a map that just has an increment function or similar. HashMap does not. I guess Collections.synchronizedList could return an object that has extra methods, but as the name suggest, that implementation then neccessarily uses locking, and there are more efficient ways to do it.

This task is better done with ConcurrentHashMap, and using the right method:

ConcurrentHashMap<String, Integer> myMap = new ConcurrentHashMap<>();

...

myMap.merge(k, 1, (a, b) -> a + b);

That does it in one call. (merge is the same as .put(k, 1) if k isn't in the map already, but if it is, it is the same as .put(k, RESULT) where RESULT is the result of running a + b where a is 'what was in the map' and 'b' is the value you are trying to add (So, 1, in this case).

A non-synchronized list can still mess up a single call, but if your 'job' involves more than one call, a synchronized one in the sense of e.g. Collections.synchronizedMap or j.u.Vector cannot safely do this.

And that's, in the end, why the advice is to not use synchronized stuff - even though it is probably not really a performance issue, there is almost no point in doing it. If you actually have concurrency needs it is unlikely that synchronizing the thing internally is going to help you, and in the case where it does, it's somewhat likely that some more specific type in the java.util.concurrent package can do it faster (because when concurrency IS happening, synchronized most definitely is not free at all).

The JMM guarantees that the JVM will not run into undefined behavior unlike e.g. C++. With C++ anything goes including the process crashing. in the JMM, when there is a data race, a read either needs to see the most recent write before it in happens-before order OR it needs to see a concurrent write (so one it is in data race with). So your statement about macarena is factually incorrect. For more information see https://download.oracle.com/otndocs/jcp/memory_model-1.0-pfd-spec-oth-JSpec/ and look for (happens-before) consistent. — pveentjer, Jan 15 '22 at 05:10
The 2 'exceptions' are accessing a long/double on a 32 bit machine that due to torn read/write, can lead to nonsense values. But this is still 'defined' behavior. — pveentjer, Jan 15 '22 at 05:12
@rzwitserloot Thanks for your answer. But the interesting fact is that even when I use the synchronized ArrayList with more than one thread, in which I think JVM will not apply it's optimization since it realizes that there are more than one threads involved, I don't see any decrease in performance! It looks to me that synchronization has a very minuscule cost if any unlike the common narrative! — al pal, Jan 15 '22 at 05:44
Please have a look at the answer I posted. It has a proper benchmark and it doesn't support your claims. — pveentjer, Jan 15 '22 at 05:46
_The JMM guarantees that the JVM will not run into undefined behavior_ - this is false. The spec states outright that if you e.g. call `.add` from multiple threads with no locks, the ArrayList impl does not explain what happens then. You don't get core dumps or invalid memory accesses, but that's not the only 'undefined behaviour' available. I guess java is 'undefined behaviour... but, the VM will not crash on you if you do this, nor will unrelated memory get corrupted'. I'm not sure there's much point in making this distinction. It's still undefined. — rzwitserloot, Jan 15 '22 at 06:16
Undefined behavior is a term that has very specific meaning in the realm of memory models. Although the behavior in Java certainly can be unexpected, it isn't undefined. — pveentjer, Jan 15 '22 at 07:54
First of all, the potential outcome of the first example (`Vector`) is incomplete, as `[]`, `[World]`, and `[Hello]` are also possible. You seem to assume that there was a guaranty that both threads completed at `println(a);` but there is none. I don’t think that it is useful to claim that things like printing a string not existing in the runtime (`"WhatTheHeckReally"`) or “play the macarena” were possible in the absence of synchronization. In practice, more than the first four bullets is possible, missing in list is printing `null` or getting an exception. So why focus on impossible results? — Holger, Jan 17 '22 at 13:26
@Holger - the intent was to show what happens when all is done. I'll update the snippets with some yield calls. The point of the exercise is: It is undefined behaviour. That's the lesson: When you do this, you're "off the map" and the code is therefore neccessarily buggy even if you can't manage to write a test case to actually make the code behave in an undesired fashion. Because of the evil coin: It will misbehave on you later. The mental model should be: "Whoops, unsynchronized concurrent access, anything can happen, this code is broken". — rzwitserloot, Jan 17 '22 at 14:04

pveentjer · Accepted Answer · 2022-01-15T08:22:57.870

If you get weird benchmark results, the first thing you need to do is to verify your benchmark. And your benchmark is flawed for quite a few reasons.

no proper warmup. It is not only the typical JIT warmup, but biased locking is disabled the first few seconds on JVM startup.
insufficient number of iterations
in theory the code could be optimized out due to dead code elimination

So I rewrote your benchmark using JMH: a micro benchmarking framework.

package com;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OperationsPerInvocation;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@OperationsPerInvocation(SyncArrayListBenchmark.OPERATIONS_PER_INVOCATION)
public class SyncArrayListBenchmark {

    public static final int OPERATIONS_PER_INVOCATION = 100_000_000;


    @Benchmark
    public int arrayList() {
        List<Integer> l1 = new ArrayList<>(OPERATIONS_PER_INVOCATION);

        IntStream.range(0, OPERATIONS_PER_INVOCATION).sequential().forEach(i -> l1.add(i));

        return l1.size();
    }

    @Benchmark
    public int synchronized_arrayList() {
        List<Integer> l1 = Collections.synchronizedList(new ArrayList<>(OPERATIONS_PER_INVOCATION));

        IntStream.range(0, OPERATIONS_PER_INVOCATION).sequential().forEach(i -> l1.add(i));

        return l1.size();
    }
}

The results of running with JDK 11:

Benchmark                                      Mode  Cnt  Score   Error  Units
SyncArrayListBenchmark.arrayList               avgt   25  4.986 ± 0.100  ns/op
SyncArrayListBenchmark.synchronized_arrayList  avgt   25  6.447 ± 0.104  ns/op

The results of running with JDK 17:

Benchmark                                      Mode  Cnt   Score   Error  Units
SyncArrayListBenchmark.arrayList               avgt   25   6.819 ± 0.300  ns/op
SyncArrayListBenchmark.synchronized_arrayList  avgt   25  10.374 ± 0.427  ns/op

Conclusion:

As you can see, the impact of synchronized ArrayList is significant.

With JDK 11, the average latency is 29% higher even though biased locking is used.

With JDK 17, the impact of synchronized ArrayList is even worse since on average latency the benchmark is 52% higher. With JDK 15, biased locking has been disabled by default and is about to be removed completely. So it is likely to be a contributing factor.

What is 'interesting' is that the synchronized version of JDK 11 is faster than the unsynchronized version of 17. I'm not sure what the cause is; perhaps related to GC changes.

I leave that as an exercise to the reader. JMH has some great profilers. The first thing I would do is to get rid of allocations and thereby excluding the garbage collector.

You are constructing the list within the method, potentially within the optimizer’s horizon. So it’s still possible that the optimizer recognizes that the list does not escape to other threads before the end of the method. It could also be subject to lock coercion which would have a big impact here, but much less in real life code where other things happen between the `add` calls. Further, it’s possible that the overhead of using a Stream operation dwarfes the overhead of the synchronization. Adding a baseline using an ordinary loop would be useful. — Holger, Jan 17 '22 at 13:06

Why not to use synchronized ArrayList even in single thread cases?

2 Answers2