1

I am learning Java collection framework (not the Concurrent Collection framework), and I came to know that some Collection implementation are thread safe and some are not.

In most of the materials what I read,all what is mentioned that xyz is thread safe and abc isn't thread safe.

But what is the logic based on which decision was taken whether to keep a given collection type (e.g., List, Set, Queue, even in Map.. ) thread safe or not?

My question is in reference to "Traditional" Collection Framework and not on Concurrent Collection Framework.

Any inputs in understanding this would be of great help.

CuriousMind
  • 8,301
  • 22
  • 65
  • 134
  • Why did you tag `concurrenthashmap`, when you then explicitly exclude `ConcurrentHashMap` in the question? – Andreas Dec 17 '17 at 12:58
  • See this: https://stackoverflow.com/questions/6045648/which-all-java-collections-are-synchronized-and-not-synchronized – GuyKhmel Dec 17 '17 at 12:59
  • @Andreas: Please suggest which tags to include? I would do so. – CuriousMind Dec 17 '17 at 12:59
  • @GuyKhmel: It doesn't tell the logic based on which some were kept synchronised and some not. – CuriousMind Dec 17 '17 at 13:02
  • The *logic* is that "Legacy" classes, which pre-date the Collection Framework, are synchronized. The "Traditional" classes are not thread-safe, and the newer "Concurrent" classes are all thread-safe (hence the name). – Andreas Dec 17 '17 at 13:06

3 Answers3

4

Thread safety carries an overhead (although, in modern VM's, the overhead is much lower than when the collection framework was designed). So collections aren't thread safe unless it is specifically required, with the exception of the JDK1.1 collections - when they were designed, the philosophy was more like "let's leave as little room for error, at the cost of some performance".

We have several phases in Java API evolvement.

JDK1.1

In version 1.1 of Java, we had the data structures Vector and Hashtable. They are completely synchronized, providing a level of thread safety.

JDK1.2

In version 1.2 of Java, the collections framework was introduced. None of the basic collections are thread-safe (they don't synchronize any operations) : ArrayList, LinkedList, HashMap, TreeMap and the Set implementations.

But you can obtain a synchronized version by calling Collections.synchronizedMap, Collections.synchronizedList, etc.

JDK1.5

In version 1.5 of Java, the java.util.concurrent framework was introduced. They contain specialized data structured for multi-threaded use. These provide a level of thread safety.


Note that even with synchronized collections, it is possible to introduce data races; it only means that you cannot destroy the internal structure of the collections (all the invariants of the collections will be maintained)

For example, if you have a two-step process where you first check that the collection doesn't contain some element, and in the second step, you insert that element. If you don't provide your own synchronization for these two steps, you can get the element added twice if two threads do this at the same time.

Erwin Bolwidt
  • 30,799
  • 15
  • 56
  • 79
  • Thanks for your answer. So, for the scenarios where they made it thread safe, what were the compelling reasons? Could you please elaborate a bit more? Any authoritative reference material I can refer to? – CuriousMind Dec 17 '17 at 13:05
  • 1
    @CuriousMind it's not easy to dig up articles/blog posts from the time. Most of the documentation of the collections framework mixes the JDK1.2 stuff with the JDK1.5 stuff. I'll see if I can find. – Erwin Bolwidt Dec 17 '17 at 13:12
  • 1
    @CuriousMind Are you asking why the legacy (JDK1.1) classes were synchronized, or are you asking why they created the concurrent (JDK1.5) classes? As the answer says, the traditional (JDK1.2) classes are not thread-safe, but can be made so by wrappering them using `Collections.synchronizedXxx`. The concurrent classes were added, because they learned that synchronized is bad for performance when there is high contention, so alternate non-synchronized thread-safe classes were created, each with their own pros/cons. – Andreas Dec 17 '17 at 13:12
  • 1
    @CuriousMind Hmm so this 2016 presentation from Josh Blog goes into history, but it doesn't specifically say why the collections weren't synchronized, but it does mention the design goal of "increased program speed", which was largely because the collections were no longer synchronized, and back then there was no fast path for uncontended synchronization like we have now. https://www.cs.cmu.edu/~charlie/courses/15-214/2016-fall/slides/15-collections%20design.pdf – Erwin Bolwidt Dec 17 '17 at 13:20
  • @Andreas : I am not specific to legacy classes, my question is related to know what were the compelling reasons for Sun, Oracle team to decided which Collection items to keep thread safe and not others. – CuriousMind Dec 17 '17 at 15:22
  • 1
    @CuriousMind But the answer *is* specific to legacy classes: Legacy = synchronized. Traditional = not thread safe. They couldn't change the legacy classes, so there was no decision to make for those. Which means your question really is: "Why did they decide not to make the Collection Framework classes synchronized (excl. Concurrent)?" Answer: **Performance**. Making a class thread-safe costs performance, as they had learned from the legacy classes. – Andreas Dec 17 '17 at 15:31
1

As stated by others, concurrent collections have a runtime and potentially a memory overhead, hence the separation in thread-safe and unsafe collections.

Most data structures you can find in the single-threaded library have several thread-safe alternatives. One notable exception is List, which is probably because it's rare to need a concurrent list in applications.

For things like queues and stacks, you have a massive amount of choice because it's a common thing to have a producer and one or several consumers pulling and pushing on a queue concurrently. To implement a cache, you may rely on a map, which is why concurrent maps are also well supported.

The fact that some data structures haven't really been mirrored in the thread-safe API is simply due to the fact they wouldn't typically be useful in a multi-threaded context.

Dici
  • 25,226
  • 7
  • 41
  • 82
-1

The reasons are most likely related to performance. Synchronization amongst multiple threads is an expensive operation, especially with a large collection of elements.

unjankify
  • 190
  • 9