7

I was going through an interview question on JavaRevisited and I'm having difficulty understanding this question :

What’s wrong with using a HashMap in a multithreaded environment? When get() method go into an infinite loop?

In my opinion, it's not a problem to use HashMap inside a multi-threaded environment, as long as our application is not accessing/reading threads which are modifying the created HashMap, rather than simply accessing the HashMap.

So, as I see it, there's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment.

Please let me know if my understanding is correct.

Erik
  • 503
  • 1
  • 7
  • 26
Pawan
  • 31,545
  • 102
  • 256
  • 434
  • I think you have a typo in your question. "as our application is creating threads which are modifying the created HashMap, rather than modifying the HashMap". I suspect the first "modifying" should be "accessing" or "reading". – Gray Jun 15 '12 at 12:51
  • Yes , thanks for pointing it out , our APplication doesn't create Threads explicitly . we just access the Data inside the HashMap . – Pawan Jun 15 '12 at 12:57

4 Answers4

24

What’s wrong using HashMap in multithreaded environment? When get() method go to infinite loop?

It is a bug to have multiple threads use a non-synchronized collection (really any mutable class) in an unprotected manner. Certain if each thread had their own HashMap instance then this is not an issue. It is a problem if multiple threads are adding to the same HashMap instance without it being synchronized. Even if just 1 thread is modifying a HashMap and other threads are reading from that same map without synchronization, you will run into problems.

If you need to use the same hash table object in multiple threads then you should consider using ConcurrentHashMap, wrapping each of the accesses to the HashMap in a synchronized {} block, or making use of the Collections.synchronizedMap(new HashMap<...>()) construct.

Chances are that the get() goes to an infinite loop because one of the threads has only a partially updated view of the HashMap in memory and there must be some sort of object reference loop. That's the peril of using an unsynchronized collection with multiple threads.

So in my understanding, it's not a problem as long as in the application we are just accessing the HashMap in a multi-threaded environment?

If by "accessing" you mean "reading", then this is true with qualifications. You must make sure:

  • All of the updates to the HashMap are completed before the threads are instantiated and the thread that creates the map also forks the threads
  • The threads are only using the HashMap in read-only mode – either get() or iteration without remove
  • There are no threads updating the map

If any of these conditions are not true then you will need to use a synchronized map instead.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • I also have similar [question](http://stackoverflow.com/questions/41952171/modifying-hash-map-from-a-single-thread-and-reading-from-multiple-threads) where I am using HashMap in multithreaded environment but I figured out I can use ConcurrentHashMap but it looks like there might be a thread safety issue in my code. I wanted to see what is the best way by which I can fix those in my code? –  Feb 27 '17 at 17:08
3

This is a classical question. ArrayList and HashMap are not synchronized, while Vector and HashTable are. You should therefore use HashTable unless you are very careful defining mutexes yourself.

In other words, the methods in e.g. HashTable will ensure that no other thread is working with the HashTable at any given time. If you use a HashMap, you'd have to do that manually by ensuring that you synchronize on HashMap before you call the method.

Update: checkout @Gray's comment. It looks like wrapping HashMap with Collections.synchronizedMap(new HashMap()) is the way to go now.

EDIT: other posters have answered way better than I did. My answer, however, generated an interesting discussion on the use of the soon to be deprecated Vector, Stack, Hashtable and Dictionary classes, so I'm leaving the question here, as a head to the comments below. Thanks guys!

CubeJockey
  • 2,209
  • 8
  • 24
  • 31
Miquel
  • 15,405
  • 8
  • 54
  • 87
  • 10
    `HashTable` is all but deprecated and should not be used. If you need a concurrent HashMap then you should use ConcurrentHashMap or `Collections.synchronizedMap(new HashMap())`. – Gray Jun 15 '12 at 12:24
  • @Gray Oh, that's interesting, I had no idea. Do you have a reference to that? With HashTable not being marked as deprecated, this came as a surprise. Thanks! – Miquel Jun 15 '12 at 12:27
  • @Gray And since we are at it, is vector also deprecated? – Miquel Jun 15 '12 at 12:29
  • @Gary Or use ConcurrentHashMap which can be safely iterated without a CME. – Peter Lawrey Jun 15 '12 at 12:29
  • @Miquel Vector, Stack, Hashtable and Dictionary were replaced in Java 1.2 about 13 years ago. You should only use these when you have to. Interestingly an alternative to Properties was never provided even though you can `properties.put(1, 0.0)` which is unsafe. ;) – Peter Lawrey Jun 15 '12 at 12:31
  • Yes, use of Vector is also frowned upon, yes. – Gray Jun 15 '12 at 12:31
  • @PeterLawrey and Gray, thanks, that's good to know! Seriously though, I don't want to miss on the next one. Where do you pick up on such things being all but deprecated, if not on the javadoc? thx! – Miquel Jun 15 '12 at 12:33
  • 1
    They are "deprecated" since the introduction of the `Concurrent` package, which introduces thread-safe collections with a finer granularity than `Vector` or `HashTable` (which achieve thread-safety through a kind of `big-lock` on the whole data structure). – akappa Jun 15 '12 at 12:38
  • Here's a good discussion on the subject. The 2nd answer (with the higher score) is a good read: http://stackoverflow.com/questions/40471/differences-between-hashmap-and-hashtable – Gray Jun 15 '12 at 12:39
  • Brilliant. Thanks a lot! I changed my original answer instead or removing it because I really value your comments on this. – Miquel Jun 15 '12 at 12:41
3

We know that HashMap is a non-synchronized collection whereas its synchronized counter-part is HashTable. So, when you are accessing the collection in a multithreaded environment and all threads are accessing a single instance of collection, then it's safer to use HashTable for various obvious reasons e.g. to avoid dirty reads and to maintain data consistency. In the worst case, this multithreaded environment can result in an infinite loop as well.

Yes, it is true. HashMap.get() can cause an infinite loop. Let us see how??

If you look at the source code HashMap.get(Object key) method, it looks like this:

 public Object get(Object key) {
    Object k = maskNull(key);
    int hash = hash(k);
    int i = indexFor(hash, table.length);
    Entry e = table[i];
    while (true) {
        if (e == null)
            return e;
        if (e.hash == hash &amp;&amp; eq(k, e.key))
            return e.value;
        e = e.next;
    }
}

while(true){...} can always be a victim of an infinite loop at runtime in a multithreaded environment, IF, somehow e.next can point to itself. This will result in an infinite loop. But, how e.next will point to itself?

This can happen in void transfer(Entry[] newTable) method, which is invoked at the time the HashMap resizing is done.

    do {
        Entry next = e.next;
        int i = indexFor(e.hash, newCapacity);
        e.next = newTable[i];
    newTable[i] = e;
        e
= next;
} while (e != null);

This piece of code is prone to produce the above condition if resizing happens and at the same time, other threads tried to modify the map instance.

The only way to avoid this scenario is to use synchronization in code, or better, use the synchronized collection.

M. Deinum
  • 115,695
  • 22
  • 220
  • 224
1

I guess that they meant to access to a shared copy of HashMap. Shared mutable state.

Since it is not synchronized every thread will grab its copy from the main memory, modify, and overwrite it.

HashMap with one entry <n, 1>

thread 1 grab the copy

thread 2 grab the copy

thread 1 modify <n, 2>

thread 2 modify <n, 3>

thread 1 is done, and stores the copy in the main memory

now memory is <n, 2>

thread 2 is done and stores the copy

now memory is <n, 3>

The state thread 1 is lost
ssedano
  • 8,322
  • 9
  • 60
  • 98