20

In Java, there is thread-safe version HashMap named ConcurrentHashMap and thread-safe version TreeMap named ConcurrentSkipListMap, but there is no ConcurrentHashSet for HashSet.

Instead, there are usually 4 ways to use thread-safe Set:

  1. Set<String> mySet = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
  2. Set<String> s = Collections.synchronizedSet(new HashSet<String>());
  3. ConcurrentSkipListSet<E>
  4. CopyOnWriteArraySet<E>

1 use keySet() of ConcurrentHashMap to achieve both Set and thread-safe.

2 use synchronized way, it seems this way is not recommended.

3 is based on ConcurrentSkipListMap and is widely used.

4 is based on CopyOnWriteArrayList, thus it shares the same basic properties of CopyOnWriteArrayList. Following is select from CopyOnWriteArraySet doc: http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CopyOnWriteArraySet.html

  • It is best suited for applications in which set sizes generally stay small, read-only operations vastly outnumber mutative operations, and you need to prevent interference among threads during traversal.
  • It is thread-safe.
  • Mutative operations (add, set, remove, etc.) are expensive since they usually entail copying the entire underlying array.
  • Iterators do not support the mutative remove operation.
  • Traversal via iterators is fast and cannot encounter interference from other threads.
  • Iterators rely on unchanging snapshots of the array at the time the iterators were constructed.

Since 1 and 3 are commonly used, why does CopyOnWriteArraySet exist? When is CopyOnWriteArraySet useful?

Added: CopyOnWriteArraySet is based on CopyOnWriteArrayList, and the contains operation in List data structure is O(n), while Set data structure is for high performance contains operation, could anybody explain this?

coderz
  • 4,847
  • 11
  • 47
  • 70
  • There is indeed no such native class in the JDK; you can, however, use `Collections.newSetFromMap(new ConcurrentHashMap<>())`. – fge Mar 25 '15 at 07:25
  • 1
    Another useful material: http://stackoverflow.com/questions/6720396/different-types-of-thread-safe-sets-in-java – coderz Mar 27 '15 at 16:53

3 Answers3

18

It is useful when you have a small set of element for a thread safe collection.

One example is a Set of listeners. You need to ensure uniqueness and iterate over them efficiently.

BTW CopyOnWriteArraySet has the lowest overhead on a per reference basis. It can be as little as 1/6 the size of the other collections. This is particularly useful if you have a lot of them.

while Set data structure is for high performance contains operation, could anybody explain this?

COWAS is more efficient in terms of memory and it's contains is faster for small collections than the alternatives. What is "high performance" depends on the use case.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 2
    Thank you for your reply! Could you please explain why ``CopyOnWriteArraySet`` is so space-saving? – coderz Mar 25 '15 at 07:47
  • 1
    It might be smaller than a `HashSet`, but I can't imagine why it would be smaller than say, `ArrayList`, especially since they are saying they make thread-local snapshots. I bet it can't even beat `TreeMap` / `TreeSet`. – VoidStar Mar 25 '15 at 08:07
  • 1
    @coderz The CopyOnWriteArraySet wraps an array of references, this means it can use as little as 4 bytes per reference (even on 64-bit JVMs) However the other sets are built on Maps which in turn have Map.Entry per element. Map.Entry is about 24 bytes plus the reference(s) to this entry making it up to 32 bytes per element depending on the collection. – Peter Lawrey Mar 25 '15 at 08:19
  • 1
    @VoidStar ArrayList is the same size but it is not a) thread-safe, b) a Set. TreeMap is not a) thread-safe, b) very space efficient and uses around 24 bytes per entry. TreeSet is built on top of TreeMap is no more efficient. – Peter Lawrey Mar 25 '15 at 08:20
  • @VoidStar ArrayList doesn't support iteration while it might be updated, nor does wrapping it in a `Collections.synchronisedList()` i.e. it is slower and it still isn't thread safe. – Peter Lawrey Mar 25 '15 at 08:24
  • I had taken your reply to mean "less overhead than all other collections", but I guess you meant less than other thread-safe wrappers. Also, you can easily do the locking yourself, and is probably advisable if you have performance concerns. You could just lock your an `ArrayList` before editing it, which is a more advisable pattern for a big list of data. – VoidStar Mar 25 '15 at 09:02
  • @VoidStar correct, though with COWAS you don't need any locking. – Peter Lawrey Mar 25 '15 at 09:12
5

Copy-on-write structures are functionally immutable.

Java at one point had a very poor story for providing immutable views on writeable structures such as sets. For example, if you had a set member, and you returned it publicly, the caller could just turn around and edit it, and therefore be editing your object's internal state! But what else can you do, copy the entire thing before returning from any public function? That would be pointlessly slow.

This was the story earlier in Java history. They relied almost exclusively on immutable objects (string is an example). Collections were an exception to this pattern, and were therefore problematic from an encapsulation perspective. When CopyOnWriteArraySet was added, unmodifiableCollection and unmodifiableSet did not yet exist (although unmodifiableCollection has largely solved the problem, I still find it a more cumbersome solution than what other languages offer, especially when using custom data structures). So this explains probably the largest motivation for creating CopyOnWriteArraySet in the first place. You could return a CopyOnWriteArraySet without fear of somebody else modifying your object's internal state, and without wasting time making unnecessary copies.

Copy-On-Write was a fad several years ago, but it is a notoriously inefficient idea for multi-threaded programming and is less efficient than other models. From the documentation you've posted, they've sped up iterating over it by creating thread-local snapshots, which means they are spending memory to compensate. So it's a perfectly okay class to use as long as your data is small... because the memory snapshots won't add up to much wasted memory.

VoidStar
  • 5,241
  • 1
  • 31
  • 45
  • For the right use case CopyOnWriteArraySet is the most efficient. Used incorrectly, any collection can be considered inefficient. – Peter Lawrey Mar 25 '15 at 08:23
  • `CopyOnWriteArraySet` is a perfectly reasonable balance between easy and efficient for most developers, but COW is not the "most efficient" way. Keep in mind that it constructs a new snapshot each time you make a new iterator, even in the same thread. Generally COW in a multi-thread environment is only optimal in very special cases, and cannot match the efficiency as simply letting the developer decide when a copy is necessary. An `ArrayList` with manual locking combo avoids unnecessary copies, but still supports copies when they help. – VoidStar Mar 25 '15 at 09:32
  • A new Iterator doesn't create a snapshot, it uses the immutable array of references. This is why it must copy-on-write whenever you alter the contents of the array. – Peter Lawrey Mar 25 '15 at 09:36
  • That's not what it says here. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CopyOnWriteArraySet.html . Your idea could work if they had been using a read-write lock to block garbage collection on an older immutable view. But if you think about it, it has to be recursive-read, and recursive read-write locking is actually quite expensive (internally you need per-thread state, e.g. a list/map), so for a small array, it actually makes more sense to copy the whole thing on iterator construction as they've documented. – VoidStar Mar 25 '15 at 09:45
  • Thought about it more, atomic refcounting would work perfectly fine. Then, the only waste is in the case of the developer was going to use the iterator for a mutative operation anyway, and then you've only wasted an extra atomic increment/decrement. I might have done it that way, not sure why they opted for the per-iterator thing. – VoidStar Mar 25 '15 at 09:57
  • There is no per-Iterator thing, other than the Iterator object itself (and this can be placed on the stack instead of the heap) – Peter Lawrey Mar 25 '15 at 10:06
1

Test code:

Set<String> a = new CopyOnWriteArraySet<String>();
    for(int i=0;i<10;i++) {
        a.add("str" + i);
    }
    boolean flag = true;
    long t1 = System.currentTimeMillis();
    for(int i=0;i<200000;i++) {
        flag = a.contains("str" + i);
    }
    System.out.println(System.currentTimeMillis() - t1);

    Set<String> b = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
    for(int i=0;i<10;i++) {
        b.add("str" + i);
    }
    t1 = System.currentTimeMillis();
    for(int i=0;i<200000;i++) {
        flag = b.contains("str" + i);
    }
    System.out.println(System.currentTimeMillis() - t1);

Shows that CopyOnWriteArraySet is slower than Collections.newSetFromMap. Since the test case is a very small Set with read only operation, CopyOnWriteArraySet seems not better.

  • 1
    The use cases for `CopyOnWriteArraySet` are different then use cases for `Collections.newSetFromMap`. Your assessment is entirely flawed. – John Vint Mar 27 '15 at 16:56
  • @JohnVint Can you speak more details? –  Mar 28 '15 at 02:17
  • Sure. A CopyOnWriteArrayList is very very good when you are doing 90+% reads. If you are writing a lot it is incredibly expensive and inefficient. COWAL is also really good when you want to safely iterate over the list with multiple threads without having to synchronize the entire collection. Collections.synchronizedList will have faster adds so maybe would be good for simple puts and removes, but not much else. – John Vint Mar 29 '15 at 23:11