5

Imagine having a main thread which creates a HashSet and starts a lot of worker threads passing HashSet to them.

Just like in code below:

void main() {
  final Set<String> set = new HashSet<>();
  final ExecutorService threadExecutor = 
  Executors.newFixedThreadPool(10);

  threadExecutor.submit(() -> doJob(set));
} 

void doJob(final Set<String> pSet) {
  // do some stuff
  final String x = ... // doesn't matter how we received the value.
  if (!pSet.contains(x)) {
    synchronized (pSet) {
      // double check to prevent multiple adds within different threads
      if (!pSet.contains(x)) {
        // do some exclusive work with x.
        pSet.add(x);
      }
    }
  }
  // do some stuff
}

I'm wondering is it thread-safe to synchronize only on add method? Is there any possible issues if contains is not synchronized?

My intuition telling me this is fine, after leaving synchronized block changes made to set should be visible to all threads, but JMM could be counter-intuitive sometimes.

P.S. I don't think it's a duplicate of How to lock multiple resources in java multithreading Even though answers to both could be similar, this question addresses more particular case.

Akirus
  • 108
  • 2
  • 11
  • 1
    In java you can create concurrent sets. See this answer for detail: https://stackoverflow.com/questions/6992608/why-there-is-no-concurrenthashset-against-concurrenthashmap – pandaadb Aug 05 '19 at 14:14
  • 5
    It's not safe. The changes made are visible to all threads at all times. You are only safe if you use a synchronized block. If you don't you could access internal structures that are part-way updated. – rghome Aug 05 '19 at 14:16
  • Possible duplicate of [How to lock multiple resources in java multithreading](https://stackoverflow.com/questions/53120361/how-to-lock-multiple-resources-in-java-multithreading) – markspace Aug 05 '19 at 14:24
  • See the duplicate link. In Java, both reads and writes must be synchronized. If you don't the JVM is free to optimize the reads however it wants, including not checking for an update from another thread. So basically all methods of your HashSet must be synchronized or your going to run into trouble. – markspace Aug 05 '19 at 14:25
  • @rghome, markspace, Thank you for your answers! "access internal structures that are part-way updated." Made it clear for me. – Akirus Aug 05 '19 at 14:29
  • Why not just use a [ConcurrentSkipListSet](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentSkipListSet.html)? – Andrew Henle Aug 05 '19 at 15:20

3 Answers3

3

I'm wondering is it thread-safe to synchronize only on the add method? Are there any possible issues if contains is not synchronized as well?

Short answers: No and Yes.

There are two ways of explaining this:

The intuitive explanation

Java synchronization (in its various forms) guards against a number of things, including:

  • Two threads updating shared state at the same time.
  • One thread trying to read state while another is updating it.
  • Threads seeing stale values because memory caches have not been written to main memory.

In your example, synchronizing on add is sufficient to ensure that two threads cannot update the HashSet simultaneously, and that both calls will be operating on the most recent HashSet state.

However, if contains is not synchronized as well, a contains call could happen simultaneously with an add call. This could lead to the contains call seeing an intermediate state of the HashSet, leading to an incorrect result, or worse. This can also happen if the calls are not simultaneous, due to changes not being flushed to main memory immediately and/or the reading thread not reading from main memory.

The Memory Model explanation

The JLS specifies the Java Memory Model which sets out the conditions that must be fulfilled by a multi-threaded application to guarantee that one thread sees the memory updates made by another. The model is expressed in mathematical language, and not easy to understand, but the gist is that visibility is guaranteed if and only if there is a chain of happens before relationships from the write to a subsequent read. If the write and read are in different threads, then synchronization between the threads is the primary source of these relationships. For example in

 // thread one
 synchronized (sharedLock) {
    sharedVariable = 42;
 }

 // thread two
 synchronized (sharedLock) {
     other = sharedVariable;
 }

Assuming that the thread one code is run before the thread two code, there is a happens before relationships between thread one releasing the lock and thread two acquiring it. With this and the "program order" relations, we can build a chain from the write of 42 to the assignment to other. This is sufficient to guarantee that other will be assigned 42 (or possibly a later value of the variable) and NOT any value in sharedVariable before 42 was written to it.

Without the synchronized block synchronizing on the same lock, the second thread could see a stale value of sharedVariable; i.e. some value written to it before 42 was assigned to it.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2

That code is thread safe for the the synchronized (pSet) { } part :

if (!pSet.contains(x)) {
  synchronized (pSet) { 
  // Here you are sure to have the updated value of pSet    
  if (!pSet.contains(x)) {
    // do some exclusive work with x.
    pSet.add(x);
  }
}

because inside the synchronized statement on the pSet object :

  • one and only one thread may be in this block.
  • and inside it, pSet has also its updated state guaranteed by the happens-before relationship with the synchronized keyword.

So whatever the value returned by the first if (!pSet.contains(x)) statement for a waiting thread, when this waited thread will wake up and enter in the synchronized statement, it will set the last updated value of pSet. So even if the same element was added by a previous thread, the second if (!pSet.contains(x)) would return false.

But this code is not thread safe for the first statement if (!pSet.contains(x)) that could be executed during a writing on the Set.
As a rule of thumb, a collection not designed to be thread safe should not be used to perform concurrently writing and reading operations because the internal state of the collection could be in a in-progress/inconsistent state for a reading operation that would occur meanwhile a writing operation.
While some no thread safe collection implementations accept such a usage in the facts, that is not guarantee at all that it will always be true.
So you should use a thread safe Set implementation to guarantee the whole thing thread safe.
For example with :

Set<String> pSet = ConcurrentHashMap.newKeySet();

That uses under the hood a ConcurrentHashMap, so no lock for reading and a minimal lock for writing (only on the entry to modify and not the whole structure).

davidxxx
  • 125,838
  • 23
  • 214
  • 215
  • Thank you for your answer! So as i got it from another answers, this code is safe only if thread safe ```Set``` implementation is used, because in worst case with HashSet, outer ```contains``` (outside synchronized block) could find set in partial-updated state and crash, am i right? Or is this particular case doesn't require to use thread safe ```Set``` implementation? – Akirus Aug 05 '19 at 15:16
  • It's not merely a question of whether or not "you...have the updated value..." It's also a question of whether `contains(x)` could return `true` for some `x` that was _never_ in the set, or `false` for an object that _definitely is_ in the set, or whether an interaction between overlapping `contains(...)` and `add(...)` calls could throw an `Error`, or segfault and crash the JVM. The Javadoc for `HashSet` is quite explicit that _all_ accesses to any given instance must be `synchronized` in a multi-threaded environment. Rule #1 of multi-threading is, Trust and Obey the library documentation. – Solomon Slow Aug 05 '19 at 18:05
  • @Solomon Slow I agree completely that the best provision is to use a Set thread safe implementation because of the reading operation. That was that I recommend at the end. My point was to distinguish the two points 1) the double lock synchronization works in terms of what the OP adds in the Set thanks to the contains() with the monitor on the object 2) but in other hand the reading operation before could cause an issue for another reason : concurrent reading and writing on a no thread safe collection. PS : I reformulated to insist on the general provision about no thread safe collection. – davidxxx Aug 06 '19 at 06:43
1

No,

You don't know in what state the Hashset might be during add by another Thread. There might be fundamental changes ongoing, like splitting of buckets, so that contains may return false during the adding by another thread, even if the element would be there in a singlethreaded HashSet. In that case you would try to add an element a second time.

Even Worse Scenario: contains might get into an endless loop or throw an exception because of an temporary invalid state of the HashSet in the memory used by the two threads at the same time.

aschoerk
  • 3,333
  • 2
  • 15
  • 29