synchronization
, when it is actually used, really does cost. However, hotspot is pretty decent at realizing that a mutex isn't actually doing anything useful and eliminating it. That's what you're seeing.
So, why isn't ArrayList
just synchronized out of the box / why isn't the advice 'use Vector, not ArrayList'? Many separate reasons:
Most important take-home reason (the rest is just historic peculiarity): Because a synchronized list is mostly useless. See below.
Modern day JVMs are pretty good at eliminating synchronized where it doesn't do anything. Which is why you are having a hard time using simple timing code to see any difference. But that wasn't always the case. ArrayList was introduced in java 1.2. Vector (a synchronized arraylist with a different API) is older than that: 1.0. ArrayList was introduced for 2 separate reasons: Partly to clean up that API, and partly because 'synchronize it!' was slow. NOW it is no longer slow, but Java 1.2 is 23 years old. Rerun your code on java 1.2 if you can find it anywhere and report back to me :)
Everything about Vector is deprecated, obsolete, and non-idiomatic. Part of that is simply 'because'. 23 years ago, the advice 'use ArrayList, not Vector' was correct for a bunch of reasons. Including "Because it is faster" (even if that is no longer true today). Now the reason to use ArrayList and not Vector is mostly: "Because ArrayList is what everybody is familiar with, Vector is not, when in rome act like romans, don't rock the boat for no reason whatsoever". This shows up in all sorts of pragmatic ways: The name 'Vector' is now being reused in the java ecosystem for something completely different (accessing hardware registers that aren't exactly 64-bit, part of Project Panama), for example.
Why is a synchronized list mostly useless?
a non-synchronized ('thread safe') implementation breaks completely; the spec says: Anything can happen. A synchronized ('thread safe') implementation does not break completely; instead, you get 1 of a permutation of options, with no guarantees whatsoever about which ones are more or less likely. That's not really more useful than utter chaos, though! For example, if I write this code:
List a = new Vector<String>();
Thread x = new Thread(() -> a.add("Hello"));
Thread y = new Thread(() -> a.add("World"));
x.start();
y.start();
x.join();
y.join();
System.out.println(a);
Then it is legal for this app to print [Hello, World]
, but it is also legal for this app to print [World, Hello]
. There is no way to know, and a VM is free to always return the one, or always return the other, or flip a coin, or make it depend on the phase of the moon. Vector is synchronized and this is still useless to me. Nobody wants to write algorithms that need to deal with a combinatory explosion of permutations!!
With ArrayList, however, which is not 'thread safe', it gets much much worse. There are way more permutations here. The JVM can do any of these without breaking spec:
- [Hello, World]
- [World, Hello]
- [Hello]
- [World]
- [null, Hello]
- [World, World]
- []
- [WhatTheHeckReally]
- pause, play the macarena over the speaker system, then crash.
Anything goes - the spec says the behaviour is unspecified. In practice, the first 4 are all entirely possible.
Avoiding this mess is good, but the permutations that the synchronized Vector offers is just.. less bad. But still bad, so who cares? You want this code to be 100% reliable: You want code to do the same thing every time (unless I want randomness, but then use java.util.Random
which has specs that explicitly spell out how it is random. Threads are free to be non-random, so if you MUST have randomness, you can't use that either).
In order to make stuff reliable, the operation needs to be either done by the object itself (you call ONE method and that is the only interaction your thread does with it), or, you need external locks.
For example, if I want put '1' in a hashmap for a key that isn't htere yet, and increment the number if it is, this code DOES NOT WORK:
Map<String, Integer> myMap = Collections.synchronizedMap(new HashMap<>());
...
String k = ...;
if (myMap.containsKey(k)) myMap.put(k, myMap.get(k) + 1);
else myMap.put(k, 1);
Seems fine? Nope, broken:
- Thread 1 calls myMap.containsKey and sees the answer is
false
.
- Thread 1 so happens to get pre-empted and freezes right there, after the if, before the
put
.
- Thread 2 runs, and increments for the same key. It, too, finds
myMap,containsKey
returning false. It therefore runs myMap.put(k, 1)
.
- Thread 1 continues running, and runs..
myMap.put(k, 1)
- The map now contains
k = 1
, even though incrementFor(k)
was run twice. Your app is broken.
See? Synchronization? It was completely useless here. What you want is either a lock:
synchronized (something) {
String k = ...;
if (myMap.containsKey(k)) myMap.put(k, myMap.get(k) + 1);
else myMap.put(k, 1);
}
and this is completely fine - no matter how had you try running incrementFor(k)
simultaneously, it'll dutifully count every invocation, or, better yet, we ask the map to do it for us, to have a map that just has an increment function or similar. HashMap
does not. I guess Collections.synchronizedList
could return an object that has extra methods, but as the name suggest, that implementation then neccessarily uses locking, and there are more efficient ways to do it.
This task is better done with ConcurrentHashMap
, and using the right method:
ConcurrentHashMap<String, Integer> myMap = new ConcurrentHashMap<>();
...
myMap.merge(k, 1, (a, b) -> a + b);
That does it in one call. (merge is the same as .put(k, 1)
if k isn't in the map already, but if it is, it is the same as .put(k, RESULT)
where RESULT is the result of running a + b
where a is 'what was in the map' and 'b' is the value you are trying to add (So, 1, in this case).
A non-synchronized list can still mess up a single call, but if your 'job' involves more than one call, a synchronized one in the sense of e.g. Collections.synchronizedMap
or j.u.Vector
cannot safely do this.
And that's, in the end, why the advice is to not use synchronized stuff - even though it is probably not really a performance issue, there is almost no point in doing it. If you actually have concurrency needs it is unlikely that synchronizing the thing internally is going to help you, and in the case where it does, it's somewhat likely that some more specific type in the java.util.concurrent
package can do it faster (because when concurrency IS happening, synchronized
most definitely is not free at all).