I have 32 machine threads and one ConcurrentHashMap<Key,Value> map
, which contains a lot of keys. Key
has defined a public method visit()
. I want to visit()
every element of map exactly once using the processing power I have available and possibly some sort of thread pooling.
Things I could try:
- I could use the method
map.keys()
. The resultingEnumeration<Key>
could be iterated over usingnextElement()
, but since a call tokey.visit()
is very brief I won't manage to keep threads busy. The Enumeration is inherently single-threaded. - I could use a synchronised
HashSet<Key>
instead, invoke a methodtoArray()
and split the work on the array into all 32 threads. I seriously doubt in this solution, since the methodtoArray()
will likely be a single-thread bottle-neck. - I could try to inherit from
ConcurrentHashMap
, get my hands on the instances of its innerSegment<K,V>
, try to group them into 32 groups and work on each group separately. This sounds like a hardcore approach though. - or similar magic with
Enumeration<Key>
.
Ideally:
- Ideally a
ConcurrentHashMap<Key, Value>
would define a methodkeysEnumerator(int approximatePosition)
, which could drop me an enumerator missing approximately first 1/32 elements, i.e.map.keysEnumerator(map.size()/32)
Am I missing anything obvious? Has anybody run into similar problem before?
EDIT
I've had a go at profiling to see whether this problem is actually going to affect the performance in practice. As I don't have access to the cluster at the moment I used my laptop and tried to extrapolate the results to a bigger dataset. On my machine I can create a 2 million keys ConcurrentHashMap and it takes about 1 second to iterate over it invoking the visit()
method on every key. The program is supposed to scale to 85 million keys (and over). The cluster's processor is slightly faster, but it still should take about 40 seconds to iterate over entire map. Now a few words about the logic flow of the program. The logic presented is sequential, i.e. it is not allowed for any thread to proceed to the next step until all the threads in the previous step are finished:
- Create the hash map, create the keys and populate the hash map
- Iterate over entire hash map visiting all the keys.
- Do some data shuffling which is parallel insertions and deletions.
- Repeat step 2 and 3 a few hundred times.
That logic flow means that a 40 second iteration is going to be repeated a few hundred times, say 100. Which gives us a bit over an hour spent just in visiting the nodes. With a set of 32 parallel iterators it could go down to just a few minutes, which is a significant performance improvement.
Now a few words on how ConcurrentHashMap
works (Or how I believe it works). Every ConcurrentHashMap
consists of segments (by default 16). Every write to a hash map is synchronised on a relevant segment. So say we're trying to write two new keys k1 and k2 to the hash map and that they would be resolved to belong to the same segment, say s1. If they are attempted to be written simultaneously, one of them is going to acquire the lock first and be added earlier then the other. What is the chance of two elements to be resolved to belong to the same segment? In case we have got a good hash function and 16 segements it is 1/16.
I belive that ConcurrentHashMap
should have a method concurrentKeys()
, which would return an array of Enumerations, one per each segment. I have got a few ideas how to add it to ConcurrentHashMap
through inheritance and I'll let you know if I succeed. As for now the solution seems to be to create an array of ConcurrentHashMaps and pre-hashing every key to resolve to one member of such array. I'll share that code as well, once it's ready.
EDIT
This is the same problem in a different language: