684

HashSet is based on HashMap.

If we look at HashSet<E> implementation, everything is been managed under HashMap<E,Object>.

<E> is used as a key of HashMap.

And we know that HashMap is not thread safe. That is why we have ConcurrentHashMap in Java.

Based on this, I am confused that why we don't have a ConcurrentHashSet which should be based on the ConcurrentHashMap?

Is there anything else that I am missing? I need to use Set in a multi-threaded environment.

Also, If I want to create my own ConcurrentHashSet can I achieve it by just replacing the HashMap to ConcurrentHashMap and leaving the rest as is?

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
Talha Ahmed Khan
  • 15,043
  • 10
  • 42
  • 49
  • 2
    After looking at the API, if I were to guess I would say that it seems to come down to 2 factors, (1) avoiding having to create a class in Java API for every little bit of functionality needed (2) Providing convenience classes for more frequently used objects. I personally prefer LinkedHashMap and LinkedHashSet since they guarantee order is the same as insertion order, the only reason for using a set is to avoid duplication, often I still want to maintain insertion order. – Ali Aug 09 '11 at 07:31
  • 1
    @Ali, *I personally prefer LinkedHashMap and LinkedHashSet* you will go far :) – bestsss Aug 09 '11 at 20:54
  • 9
    A bit old question, but as it is the first result in Google, may be useful to know that ConcurrentSkipListSet already has the implementation of ConcurrentHashMap. See http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentSkipListSet.html – Igor Rodriguez Jun 13 '13 at 16:18
  • 1
    What I saw from Java source `ConcurrentSkipListSet` is built on `ConcurrentSkipListMap`, which implements `ConcurrentNavigableMap` and `ConcurrentMap`. – Talha Ahmed Khan Jun 14 '13 at 12:18
  • possible duplicate of [is Java HashSet thread-safe for read only?](http://stackoverflow.com/questions/5379794/is-java-hashset-thread-safe-for-read-only) – Nirro Nov 05 '14 at 18:52
  • Very good point of view. Perhaps the guys at java thought that always is good to associate things.. so they promote fiercely this idea!! – Victor Jun 28 '15 at 00:57
  • Now I need this feature too, and I have the same question as you have. I used `ConcurrentSkipListSet` for sorting as well, the bottom storage is a `map`, but unfortunately not a `Hash` one. I don't think this has an excuse to not support it, maybe comes in future edition. – Tiina Mar 19 '18 at 05:04

10 Answers10

728

There's no built in type for ConcurrentHashSet because you can always derive a set from a map. Since there are many types of maps, you use a method to produce a set from a given map (or map class).

Prior to Java 8, you produce a concurrent hash set backed by a concurrent hash map, by using Collections.newSetFromMap(map)

In Java 8 (pointed out by @Matt), you can get a concurrent hash set view via ConcurrentHashMap.newKeySet(). This is a bit simpler than the old newSetFromMap which required you to pass in an empty map object. But it is specific to ConcurrentHashMap.

Anyway, the Java designers could have created a new set interface every time a new map interface was created, but that pattern would be impossible to enforce when third parties create their own maps. It is better to have the static methods that derive new sets; that approach always works, even when you create your own map implementations.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
  • 5
    Am I right to say that if you create the set this way from `ConcurrentHashMap`, you lose the benefits you'd get from `ConcurrentHashMap` ? – Pacerier Nov 01 '11 at 20:47
  • 21
    There are no benefits to lose. `newSetFromMap`'s implementation is found starting on line 3841 in http://www.docjar.com/html/api/java/util/Collections.java.html. It's just a wrapper.... – Ray Toal Nov 01 '11 at 23:36
  • @zneak, isn't the `add` operation on a `Set` equivalent to what `putIfAbsent` does? The other operations in `ConcurrentMap` don't offer any functionality that doesn't exist in `Set`. `Set` already has a `remove` method and I don't see how a `ConcurrentMap`-like `replace` operation would make sense on a `Set` object. – Andrew McNamee Jul 19 '12 at 10:36
  • 6
    @Andrew, I think the motivation behind using a "ConcurrentSet" stems from not the API but rather the implementation - thread safety but without a universal lock - multiple *concurrent* reads for instance. – Ustaman Sangat Sep 13 '12 at 16:07
  • @UstamanSangat: hmmm, I think the comment I was responding to was deleted at some point. I vaguely remember it asking about why there wasn't a `ConcurrentSet` interface (and my argument was that it wouldn't provide any new methods; it'd only be a marker interface, like `RandomAccess`. Admittedly `Set` itself is basically a marker interface since it doesn't provide new methods; just new constraints). – Andrew McNamee Sep 14 '12 at 21:42
  • 1
    If your elements are `Comparable`, why not simply use [`ConcurrentSkipListSet`](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentSkipListSet.html)? – user454322 Oct 02 '14 at 10:13
  • 5
    ConcurrentSkipList has lots of (size) overhead and the lookups are slower. – eckes Jun 10 '15 at 15:44
  • 2
    My complaint is that having to go this route, you lose the `HashSet(Collection extends E> c)` constructor, with its nice syntactic sugar for instance converting a List to a Set. Or am I wrong? – rogerdpack Sep 14 '15 at 21:17
  • 1
    @rogerdpack: Good point. I just ran into this now. `newSetFromMap` could do that work for us. – kevinarpe Dec 14 '15 at 04:58
  • 1
    One thing I found with the ConcurrentHashMap.newKeySet (and the Collections alternative route) is that it doesn't cater for null keys and will NPE on a remove call (which HashSet will not do). – Neil Stockton Oct 06 '16 at 08:12
  • 1
    The real problem when the JDK not providing a true ConcurrentSet interface is that you'll be missing some atomic operations on that Set, namely `addIfAbsent`, `computeIfAbsent`, etc. With JDK8 you can create those yourself because `ConcurrentHashMap.KeySetView` gives you access to the underlying ConcurrentMap. With JDK7 you're stuck and may have to resort to creating your own wrapper. – peterh Jan 14 '18 at 17:03
  • 8
    take care when using this approach, since some methods are not implemented correctly. Just follow the links: ``Collections.newSetFromMap`` creates a ``SetFromMap``. e.g. the ``SetFromMap.removeAll`` method delegates to the ``KeySetView.removeAll``, that inherits from ``ConcurrentHashMap$CollectionView.removeAll``. This method is highly inefficient in bulk removing elements. imagine ``removeAll(Collections.emptySet())`` traverses all elements in the ``Map`` without doing anything. Having a ``ConcurrentHashSet`` that is corretly implemented will be better in most cases. – benez Mar 01 '18 at 09:50
  • To follow up on @benez comment, HashSet gets it's removeAll from AbstractSet, which will at least iterate on the smaller of the two collections... so it's not _super_ terrible how the CollectionView works... especially if you look a bit closer at the perf characteristics of concurrent map... by always using it's own iterator, it can use Iterator.remove to detach an item, whereas the AbstractSet version might iterate the argument and then call .remove(), which causes a complete hash table lookup (versus Iterator.remove, which can avoid the expensive checks / graph transversals). – Ajax Aug 06 '18 at 23:08
  • Basically, if the set/map knows it is expensive to do a "cold remove()", it can make the (right) choice to avoid pathological performance when both collections are large, at the expense of (maybe) poorer performance when collections are small or empty. Especially empty, since you, the caller, can optimize by filtering out empty queries before you make them. – Ajax Aug 06 '18 at 23:09
  • 1
    @Ajax my comment relates to Java8 and before. they have adressed this with Java9+, so the method indeed makes these checks now. – benez Aug 08 '18 at 17:59
  • Ah, cool. many thanks for the update. My company hasn't made the leap to java9+ just yet, so always good to pick up tidbit for when we have to bite the bullet (soon). :-) – Ajax Aug 10 '18 at 15:42
  • 1
    Then why ConcurrentSkipListSet exists, when you can derive a set from a ConcurrentSkipListMap? – Nick Nick Sep 13 '18 at 09:57
  • 1
    That's a good question for the author, Doug Lea. He definitely decided it was a good idea to build ConcurrentSkipListSet by wrapping a ConcurrentSkipListMap -- see line 65 of http://fuseyism.com/classpath/doc/java/util/concurrent/ConcurrentSkipListSet-source.html. I suppose one could figure out why he did that by comparing the source code of the map and set (perhaps in this case there was a benefit to wrapping the map?) but it might be easier to ask Doug directly. I suspect it has something to do with skiplists being sorted sets as opposed to unrestricted sets that can use hashing. – Ray Toal Sep 14 '18 at 05:53
  • 1
    @NickNick that’s a question we could expand to the entire Collection API. `HashSet` is just a wrapper around `HashMap`, `LinkedHashSet` a wrapper around `LinkedHashMap`, `TreeSet` is just a wrapper around `TreeMap`… – Holger Mar 17 '22 at 10:33
125
Set<String> mySet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
Paresh Mayani
  • 127,700
  • 71
  • 241
  • 295
Serge Mask
  • 1,331
  • 1
  • 9
  • 6
102

With Guava 15 you can also simply use:

Set s = Sets.newConcurrentHashSet();
kichik
  • 33,220
  • 7
  • 94
  • 114
  • 16
    This is always a nightmare. If you have a set or a map that does not indicate whether or not something is thread safe you find all kind of hazards and desasters happen in maintaince. I always would want a type that indicates thread safety for collections (or not). – Martin Kersten Dec 14 '15 at 09:01
  • 13
    The method description is literally "Creates a thread-safe set backed by a hash map" – kichik Dec 14 '15 at 16:33
  • 18
    As I said, there is a ConcurrentSet missing. ConcurrentHashMap comes along with a ConcurrentMap interface to indicate this. This is the very same reason I always add this ConcurrentSet interface as well. – Martin Kersten Dec 15 '15 at 09:59
83

Like Ray Toal mentioned it is as easy as:

Set<String> myConcurrentSet = ConcurrentHashMap.newKeySet();
Community
  • 1
  • 1
BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • 1
    This seems to require Java 8. Looking at implementation, this also seems to be just a wrapper of `ConcurrentHashMap`. – Mygod Oct 08 '18 at 05:10
22

It looks like Java provides a concurrent Set implementation with its ConcurrentSkipListSet. A SkipList Set is just a special kind of set implementation. It still implements the Serializable, Cloneable, Iterable, Collection, NavigableSet, Set, SortedSet interfaces. This might work for you if you only need the Set interface.

Mike Pone
  • 18,705
  • 13
  • 53
  • 68
  • 12
    Note that `ConcurrentSkipListSet`'s elements should be `Comparable` – user454322 Oct 02 '14 at 10:14
  • If you need to extend from a Set that is concurrent, this is the only solution here that will work. – ndm13 Dec 28 '16 at 06:17
  • ConcurrentSkipListMap adds unnecessary performance penalty of having tree as base data structure, instead of using HashTable, even when you do not need sorting/navigation functionality. – Ajeet Ganga Sep 09 '17 at 18:31
  • 6
    don't use ``ConcurrentSkipListSet`` unless you want a ``SortedSet``. A usual operation like add or remove should be O(1) for a ``HashSet``, but O(log(n)) for a ``SortedSet``. – benez Mar 01 '18 at 09:38
20

As pointed by this the best way to obtain a concurrency-able HashSet is by means of Collections.synchronizedSet()

Set s = Collections.synchronizedSet(new HashSet(...));

This worked for me and I haven't seen anybody really pointing to it.

EDIT This is less efficient than the currently aproved solution, as Eugene points out, since it just wraps your set into a synchronized decorator, while a ConcurrentHashMap actually implements low-level concurrency and it can back your Set just as fine. So thanks to Mr. Stepanenkov for making that clear.

http://docs.oracle.com/javase/8/docs/api/java/util/Collections.html#synchronizedSet-java.util.Set-

Nirro
  • 746
  • 4
  • 18
  • 26
    the `synchronizedSet` method just creates the [decorator](http://en.wikipedia.org/wiki/Decorator_pattern) under `Collection` to wrap methods that could be thread-safe by synchronization the whole collection. But `ConcurrentHashMap` is implemented using [non-blocking algorithms](https://en.wikipedia.org/wiki/Non-blocking_algorithm) and "low-level" synchronisations without any locks of the whole collection. So wrapers from `Collections.synchronized`... is worse in multi-threads environments for performance reasons. – Eugene Stepanenkov Jan 01 '15 at 19:53
14

You can use guava's Sets.newSetFromMap(map) to get one. Java 6 also has that method in java.util.Collections

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
8
import java.util.AbstractSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;


public class ConcurrentHashSet<E> extends AbstractSet<E> implements Set<E>{
   private final ConcurrentMap<E, Object> theMap;

   private static final Object dummy = new Object();

   public ConcurrentHashSet(){
      theMap = new ConcurrentHashMap<E, Object>();
   }

   @Override
   public int size() {
      return theMap.size();
   }

   @Override
   public Iterator<E> iterator(){
      return theMap.keySet().iterator();
   }

   @Override
   public boolean isEmpty(){
      return theMap.isEmpty();
   }

   @Override
   public boolean add(final E o){
      return theMap.put(o, ConcurrentHashSet.dummy) == null;
   }

   @Override
   public boolean contains(final Object o){
      return theMap.containsKey(o);
   }

   @Override
   public void clear(){
      theMap.clear();
   }

   @Override
   public boolean remove(final Object o){
      return theMap.remove(o) == ConcurrentHashSet.dummy;
   }

   public boolean addIfAbsent(final E o){
      Object obj = theMap.putIfAbsent(o, ConcurrentHashSet.dummy);
      return obj == null;
   }
}
Ankush soni
  • 1,439
  • 1
  • 15
  • 30
MD. Mohiuddin Ahmed
  • 2,092
  • 2
  • 26
  • 35
  • 3
    I like the idea to use Boolean.TRUE instead of an dummy object. It is a little bit more elegant. Also using NULL is also possible since it would be available in the key set even if mapped to null. – Martin Kersten Dec 14 '15 at 09:05
  • 3
    @MartinKersten fyi, ConcurrentHashMap doesn't allow null values – Lauri Lehtinen Jan 13 '16 at 19:19
2

Why not use: CopyOnWriteArraySet from java.util.concurrent?

Shendor
  • 749
  • 2
  • 11
  • 18
  • 15
    Because CopyOnWriteArraySet copies the entire collection on any state mutation, which is not always wanted due to the performance impact. It's designed to work only in special cases. – boneash Jan 16 '17 at 14:59
  • Additionally `CopyOnWriteArraySet.contains()` has a run-time of `O(n)` (has to check ever entry) where as HashSet/HashMap has `O(1)`. – Robert Dec 01 '20 at 12:02
  • because this works only one way: if you're reading data owned/managed by another thread. if you need to pass data to that thread, you'd have to additionally update its atomic ref to updated array - but that makes little sense – mantrid Feb 23 '21 at 18:10
0

We have ConcurrentHashSet in io.vertx which can be used against java.util ConcurrentHashMap. https://www.javadoc.io/static/io.vertx/vertx-core/3.0.0/io/vertx/core/impl/ConcurrentHashSet.html

Sart
  • 9
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – jv-k Jan 18 '23 at 11:56