I have a C++17 application that has multiple threads concurrently writing to a map of maps:
// global shared variable across threads
std::map<int, std::map<int, int>> the_map;
// many threads inserting different values for varying i and j
auto val = somewhat_expensive_f()
the_map[i][j] = val;
For this application, there are tens of thousands of unique (i,j) pairs and on the order of 2,000 map in the_map upon completion. I'm experimenting with running multiple threads running expensive computations that insert into this map of maps. Right now I'm using a std::map
which doesn't allow concurrent insertions.
I wrapped the insertions with a std::lock_guard<std::mutex>
as a first cut, and of course that really slowed down the application and hamstrung the concurrency. My instinct is that I can either use some concurrent map of maps or fine-grained locking.
For the second approach, my instinct is to do some sort of array of locks that are indexed into using a hash of the (i,j) tuple. For example, lock_guard<mutex>(array_of_locks[hash((i<<32)|j) % array_sz])
could allow multiple locks to be shared across the thousands of sub-maps.
Question 1: Am I on the right track? Any feedback on this approach?
Question 2: With this approach, one concern I have is false sharing of adjacent mutexes in the array. I could pad them out to fill an entire cache line. Are there better approaches?
Another approach I could consider is to do some sort of inserting into thread-local maps and then combine them later in a master thread.