2

There have been many discussions on this topic, e.g. here:

What's the difference between ConcurrentHashMap and Collections.synchronizedMap(Map)?

But I haven't found an answer to my specific use-case.

In general, you cannot assume that a HashMap is thread-safe. If write to the same key from different threads at the same time, all hell could break loose. But what if I know that all my threads will have unique keys?

Is this code thread-safe or do I need to add blocking mechanism (or use concurrent map)?

Map<int, String> myMap = new HashMap<>();
for (int i = 1 ; i > 6 ; i++) {
    new Thread(() -> {
        myMap.put(i, Integer.toString(i));
    }).start();
}
Mor Blau
  • 420
  • 3
  • 15
  • 1
    Nothing is guaranteed. Unique keys doesn't really mean anything. – Sweeper Aug 09 '21 at 09:54
  • My question is more about the underlying mechanism. Are the different keys isolated or will altering one possibly affect the other action – Mor Blau Aug 09 '21 at 09:56
  • 1
    @MorBlau: think about it: of course there needs to be some shared infrastructure, or you wouldn't really have a `Map`. You'd be closer to a `Collection>` (and even that would have thread-safety issues. – Joachim Sauer Aug 09 '21 at 09:58
  • 1
    See [What implementation detail makes this code fail so easily?](https://stackoverflow.com/q/40927934/2711488) for an example that *does* use distinct keys… – Holger Aug 16 '21 at 12:21

2 Answers2

5

The answer is simple: HashMap makes absolutely no thread-safety guarantees at all.

In fact it's explicitly documented that it's not thread-safe:

If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

So accessing one from multiple threads without any kind of synchronization is a recipe for disaster.

I have seen cases where each thread uses a different key cause issue (like iterations happening at the same time resulting in infinite loops).

Just think of re-hashing: when the threshold is reached, the internal bucket-array needs to be resized. That's a somewhat lengthy operation (compared to a single put). During that time all manner of weird things can happen if another thread tries to put as well (and maybe even triggers a second re-hashing!).

Additionally, there's no reliable way for you to proof that your specific use case is safe, since all tests you could run could just "accidentally" work. In other words: you can never depend on this working, even if you thin k you covered it with unit tests.

And since not everyone is convinced, you can easily test it yourself with this code:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

class HashMapDemonstration {

  public static void main(String[] args) throws InterruptedException {
    int threadCount = 10;
    int valuesPerThread = 1000;
    Map<Integer, Integer> map = new HashMap<>();
    List<Thread> threads = new ArrayList<>(threadCount);
    for (int i = 0; i < threadCount; i++) {
      Thread thread = new Thread(new MyUpdater(map, i*valuesPerThread, (i+1)*valuesPerThread - 1));
      thread.start();
      threads.add(thread);
    }
    for (Thread thread : threads) {
      thread.join();
    }
    System.out.printf("%d threads with %d values per thread with a %s produced %d entries, should be %d%n",
        threadCount, valuesPerThread, map.getClass().getName(), map.size(), threadCount * valuesPerThread);
  }
}

class MyUpdater implements Runnable {
  private final Map<Integer, Integer> map;
  private final int startValue;
  private final int endValue;

  MyUpdater(Map<Integer, Integer> map, int startValue, int endValue) {
    this.map = map;
    this.startValue = startValue;
    this.endValue = endValue;
    System.out.printf("Creating updater for values %d to %d%n", startValue, endValue);
  }

  @Override
  public void run() {
    for (int i = startValue; i<= endValue; i++) {
      map.put(i, i);
    }
  }
}

This is exactly the type of program OP mentioned: Each thread will only ever write to keys that no other thread ever touches. And still, the resulting Map will not contain all entries:

Creating updater for values 0 to 999
Creating updater for values 1000 to 1999
Creating updater for values 2000 to 2999
Creating updater for values 3000 to 3999
Creating updater for values 4000 to 4999
Creating updater for values 5000 to 5999
Creating updater for values 6000 to 6999
Creating updater for values 7000 to 7999
Creating updater for values 8000 to 8999
Creating updater for values 9000 to 9999
10 threads with 1000 values per thread with a java.util.HashMap produced 9968 entries, should be 10000

Note that the actual number of entries in the final Map will vary for each run. It even sometimes prints 10000 (because it's not thread-safe!).

Note that this failure mode (losing entries) is definitely not the only possible one: basically anything could happen.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
0

I would like to specifically respond to the phrase.

But what if I know that all my threads will have unique keys?

You are making an assumption about the implementation of the map. The implementation is subject to change. If the implementation is documented not to be thread-safe, you must take into account the Java Memory Model (JMM) that guarantees almost nothing about visibility of memory between threads.

This is making a lot of assumptions and few guarantees. You should not rely on these assumptions, even if it happens to work on your machine, in a specific use-case, at a specific time.

In short: if an implementation that is not thread-safe is used in multiple threads, you MUST surround it with constructs that ensure thread-safety. Always.

However, just for the fun of it, let's describe what can go wrong in your particular case, where each thread only uses a unique key.

When adding or removing a key, even if unique, there are cases when a hash map needs to reorganise internally. The first one is in case of a hash-collision,1 in which a linked list of key-value entries must be updated. The second one is where the map decides to resize its internal entry table. That overhauls the internal structure including the mentioned linked lists.

Because of the JMM it is largely not guaranteed what another thread sees of the reorganisation. That means that behaviour is undefined if another threads happens to be in the middle of a get(key) when the reorganisation happens. If another thread is concurrently doing a put(key,value), you could end up with two threads trying to resize the map at the same time. Frankly, I do not even want to think what mayhem that can cause!


1 Multiple keys can have the same hash-code. Because the map has no limitless storage, the hash-code are often also wrapped around with the size of the internal table of entries, like (hashCode % sizeOfTable), which can lead to a situation where different hash-codes utilize the same "entry".

Emmef
  • 500
  • 2
  • 7