2

I found this code on Peter Lawry's blog here. He mentions that this class doesn't require any further synchronization.

I am trying to improve my knowledge of concurrency and how to avoid unnecessary synchronization so am trying to figure how to reason about this case from a java memory model point of view.

The reference to the string array is final and the string themselves are immutable but the references to the strings contained in the array are mutable

  • Isn't it at least theoretically possible that one thread could still see null after another thread has updated a value?
  • Or is it the case that we don't care if the string is interned more than once?
  • Or does the JVM provide some additional guarantee that I am missing?

    public class StringInterner {
       private final String[] interner;
       private final int mask;
    
       public StringInterner(int capacity) {
          int n = Maths.nextPower2(capacity, 128);
          interner = new String[n];
          mask = n - 1;
       }
    
       private static boolean isEqual(@Nullable CharSequence s, @NotNull CharSequence cs) {
          if (s == null)
             return false;
          if (s.length() != cs.length())
             return false;
          for (int i = 0; i < cs.length(); i++)
             if (s.charAt(i) != cs.charAt(i))
                return false;
          return true;
       }
    
       @NotNull
       public String intern(@NotNull CharSequence cs) {
          long hash = 0;
          for (int i = 0; i < cs.length(); i++)
             hash = 57 * hash + cs.charAt(i);
          int h = (int) Maths.hash(hash) & mask;
          String s = interner[h];
          if (isEqual(s, cs))
             return s;
          String s2 = cs.toString();
          return interner[h] = s2;
       }
    }
    
pirho
  • 11,565
  • 12
  • 43
  • 70
Shane
  • 2,271
  • 3
  • 27
  • 55

2 Answers2

2

In your case I think that we don't care that the String is interned twice. The memory model takes care that nothing evil happens when assigning array values. Here is a related question: java array thread-safety

From the concurrency point of view it is working without synchronisation because the data is safe. So the class is working correctly in concurrent access.

If you like to be strict in the case that intern should happen only once you need to have synchronisation but that has its price. It depends on your use case what correctness is for you. (as pointed by gudok: independent of concurrency still intern happens more than once because of hashing)

Sorontur
  • 500
  • 4
  • 15
  • Note: even explicit synchronization won't guarantee that intern doesn't happen multiple times. This is because multiple strings may hash to the same bucket. – gudok Oct 24 '17 at 09:31
  • @gudok: yes that is true. I was just looking from the concurrent point of view. i will add that to the answer – Sorontur Oct 24 '17 at 09:37
  • @Sorontur - thanks for your answer, so to confirm, am I right in reasoning that a thread could see an out of date value for a string (either null or a previously stored value) after another thread has interned a string in that memory location? – Shane 50 mins ago – Shane Oct 24 '17 at 10:55
  • 1
    @Shane: the memory model does not guarantee strict visibility. This means that even after the new reference is written, another thread can still see the old reference. that could happen because of registers, pipline or instruction optimisation in the CPU. I never observed this in practise, but it can happen in theory. – Sorontur Oct 24 '17 at 11:18
  • @Sorontur visibility issue is real and can occur. If you wish to read about it or practice it, i recommend first chapter of book, Seven Concurrency Models in Seven Weeks. There are some very good examples with detailed explanation –  Oct 24 '17 at 14:25
0

I agree with Sorontur comment about visibility. This code can produce unexpected results (it can be tricky to reproduce scenario). It seems intern method is not thread safe. Multiple threads can run in parallel on multiple cores, usually each core has its own cache. If one thread update any variable in intern, it will immediately update its cache but cache of other core will not update at same time, will take sometime, meanwhile other threads may use old values. So, to carter this situation you can use volatile variables but it will effect performance. Hence, multi-threading on shared memory model is a trade-off between performance and efficiency.

Note: I think unexpected behavior can be seen on concurrent threads, it is not specific to parallel execution