Regarding internal working of concurrent hashmap

Question

I was going through the ConcurrentHashMap and this related tutorial, and had some questions.

In the article, it was mention that ConcurrentHashMap allows multiple readers to read concurrently without any blocking. This is achieved by partitioning the Map into different parts based on concurrency level and locking only a portion of the Map during updates. Default concurrency level is 16, and accordingly the Map is divided into 16 part and each part is governed with a different lock. This means, 16 threads can operate on Map simultaneously, until they are operating on different parts of the Map. This makes ConcurrentHashMap high performant despite keeping thread-safety intact. Though, it comes with a caveat: Since update operations like put(), remove(), putAll() or clear() are not synchronized, concurrent retrieval may not reflect the most recent change on the Map
Another point also mentioned in the article: Another important point to remember is iteration over CHM, Iterator returned by keySet are weakly consistent and they only reflect state of ConcurrentHashMap at a certain point and may not reflect any recent change.

I have not understood the points highlighted in bold, could you provide more info or show me in a simple program?

This gives answers to your 1st question: http://stackoverflow.com/questions/14947723/is-concurrenthashmap-totally-safe/14947818#14947818 — Lee Meador, Feb 27 '13 at 17:27

score 2 · Answer 1 · answered Feb 27 '13 at 17:49

Since update operations like put(), remove(), putAll() or clear() is not synchronized, concurrent retrieval may not reflect most recent change on Map

As I understand it, this means that a modification of the map in one thread may not necessarily be seen by a retrieval happening at the same time in another thread. Consider the following example:

                  Thread 1 starts              Thread 1's call to get("a")
                 a call to get("a")             completes, returning null
                         |                                 |
Thread 1        ---------+---------------------------------+-------
                             time ----->
Thread 2        -----+---------------------------+-----------------
                     |                           |
             Thread 2 starts a            Thread 2's call to
            call to put("a", 1)          put("a", 1) completes

Even though Thread 2 put a value in the map Thread 1's get completed execution, Thread 1 did not "see" the map modification, and returned null.

Another important point to remember is iteration over CHM, Iterator returned by keySet of ConcurrentHashMap are weekly consistent and they only reflect state of ConcurrentHashMap and certain point and may not reflect any recent change.

This is a similar situation. If Thread 1 obtains an Iterator from a ConcurrentHashMap's keySet, and later Thread 2 puts a new entry in the map, Thread 1's Iterator is not guaranteed to see that entry. (It may or it may not.)

Lee Meador · Answer 2 · 2013-02-27T19:31:06.270

The real issue here is that when multiple threads are fooling with a data structure, the threads won't necessarily march in lock step.

One thread is reading for user1. One thread is writing for user2. Neither thread can predict where in their respective processes the other thread will be. Further, we can't predict for the users any sort of ordering on these two processes completion. If the write updates the data first, the read will show the updated state even though user1 might have requested a read a little earlier.

Reading or modifying when iterating work the same way with the additional consideration that the process of moving to the next one (when iterating) essentially becomes a "read" operation on the state of the Map, if not the content of any particular data in it.

So, when you allow concurrency in these data structures, you end up with a "close enough" test in terms of time. (It's a lot like the same considerations with databases except we are used to thinking of databases that way and the time frames are a couple of factors of 10 different.

NOTE: To make a comment about a wonderful little timeline shown by @Matts in another answer ...

The timeline shows two threads and a start and stop for each thread. The starts for the two threads can occur in either order (a,b) or (b,a). The ends can occur in either order because you can't tell how long the operations take. That gives 4 ways the two threads can start and finish. (a starts first and ends first, a starts first and b ends first, b starts first and a ends first, b starts first and b ends first) Now ... imagine 20 threads all doing the same thing in response to, say, 20 end users submitting requests for this and that. How many possible ways can it work.

can you please also show a small program also that will make understanding more clear — user2094103, Feb 27 '13 at 17:36
Not really. This issue only comes up occasionally and only when things happen at about the same time from different threads. You can't program that because its going to depend on the speed of the processor and the number of CPUs available to the JVM running the code. — Lee Meador, Feb 27 '13 at 19:26
I should point out that I have, in the past, tried to write low level unit tests to cause such a thing to happen. In that case, the inconsistencies between threads was a bug. I have never found a good way to do it. You can sometimes make it happen a small percentage of the time given a particular processor configuration. Then the test consists of repeating the test X times and expecting it to fail Y% of the time. Once the test machine is swapped out for another one ... it quits happening. But even until that point its not very satisfying to have a test thats more like a political poll. — Lee Meador, Feb 27 '13 at 19:35

Regarding internal working of concurrent hashmap

2 Answers2

Linked