Fine-grained locks on C++ map of maps

Question

I have a C++17 application that has multiple threads concurrently writing to a map of maps:

// global shared variable across threads
std::map<int, std::map<int, int>> the_map;

// many threads inserting different values for varying i and j
auto val = somewhat_expensive_f()
the_map[i][j] = val;

For this application, there are tens of thousands of unique (i,j) pairs and on the order of 2,000 map in the_map upon completion. I'm experimenting with running multiple threads running expensive computations that insert into this map of maps. Right now I'm using a std::map which doesn't allow concurrent insertions.

I wrapped the insertions with a std::lock_guard<std::mutex> as a first cut, and of course that really slowed down the application and hamstrung the concurrency. My instinct is that I can either use some concurrent map of maps or fine-grained locking.

For the second approach, my instinct is to do some sort of array of locks that are indexed into using a hash of the (i,j) tuple. For example, lock_guard<mutex>(array_of_locks[hash((i<<32)|j) % array_sz]) could allow multiple locks to be shared across the thousands of sub-maps.

Question 1: Am I on the right track? Any feedback on this approach?

Question 2: With this approach, one concern I have is false sharing of adjacent mutexes in the array. I could pad them out to fill an entire cache line. Are there better approaches?

Another approach I could consider is to do some sort of inserting into thread-local maps and then combine them later in a master thread.

@FrançoisAndrieux thank you, that's a good point and actually what's already happening; I oversimplified my example code. I've updated it it. — Kulluk007, Oct 09 '19 at 20:40
Instead of manually padding to avoid false sharing, you could use `alignas(std::hardware_destructive_interference_size)`. In a toolchain where `std::hardware_destructive_interference_size` isn't available, I usually go for 64. It's just a guess but it seems to help. — Ted Lyngmo, Oct 09 '19 at 20:41
You're only holding the lock while doing `map[i][j]=val;` right? If so, the lock should not significantly affect concurrency since most of the time it will be uncontended as the threads are busy in `somewhat_expensive_f`. — David Schwartz, Oct 09 '19 at 20:49
Do the other threads really need access to the values in the outer `map`? If not, just populate the inner map in each thread and add the whole thing to the outer map when it's ready. — Ted Lyngmo, Oct 09 '19 at 20:52
@DavidSchwartz the challenge is in the real application there are a handful of places where things are calculated and inserted and all of the insertions are on different areas of the map almost all of the time. I am seeing lock contention and it seems like mostly unnecessary given that >99% of the insertions have nothing to do with each other because they are insertions into different submaps. That's why I'm trying to come up with a more fine-grained locking structure. — Kulluk007, Oct 09 '19 at 20:56
You're on the right track. This is called striped locking. Some implementations will increase or decrease the number of locks dynamically as the map grows or shrinks. — Eric, Oct 09 '19 at 20:58
@TedLyngmo you are correct to point out that the other threads don't need the inserted values while the rest of the insertions are occurring. However, at the end of the computation, the next phase of the application needs this single, organized data structure to be like this. So, I could create a bunch of thread-local map of maps and then combine them in a single thread somehow. — Kulluk007, Oct 09 '19 at 20:58
It looks like you're not concerned about overwriting existing values in the map, right? You could probably forget the locking and go with a fully lock-free implementation in that case. — Eric, Oct 09 '19 at 21:10
@Kulluk007 Sure, that should make it require no synchronizing for the insertion. Just let the main thread do the insertion after each worker is done/joined. — Ted Lyngmo, Oct 09 '19 at 21:15
@Kulluk007 You really shouldn't be seeing lock contention though. That makes me suspect you either aren't measuring correctly or don't understand your code's behavior correctly. Given that, I'd be hesitant to make design changes based on that understanding. Understanding why multithreaded coded is behaving as it is so you can form a proper plan to improve it is *very* hard. — David Schwartz, Oct 09 '19 at 22:01

Matt Timmermans · Accepted Answer · 2019-10-10T03:59:14.403

1

Dividing your map into multiple maps, each of which can be locked independently, is a good and practical approach. The key to doing this efficiently is to remember that, while you may have 10s of thousands of entries in your map, you probably don't have that many threads or that many cores.

If your machine has, say 8 cores, then hashing your keys into 64 different buckets, each with its own map and mutex, will ensure that contention is unlikely and won't significantly slow down the application.

At most 8 cores could be trying to insert at the same time, and even if they were doing that constantly, they'd only be blocked 12% of the time. Your threads will probably have lots of other things to do, though, so real contention would be much less than that.

As @Eric indicates, the magic Google words for this are "lock striping".

Regarding false sharing of adjacent mutexes: std::map insertions aren't fast enough for that to be a real problem.

One thing you might have to worry about is contention in the memory allocator that is used to allocate map nodes. They all come from the same heap, after all.

edited Oct 10 '19 at 03:59

answered Oct 10 '19 at 02:26

Matt Timmermans

53,709
3
46
87

I tried this and am hitting a TSAN data race; I'm using striped locking with 8 buckets, keyed on `i` in my example above (`the_map[i][j] = val`). So, two threads that both want to insert into `the_map` with the same `i` will need to acquire the same lock, regardless of `j`. However, TSAN is reporting a race on `map::operator[]` for two threads trying to write into `the_map`. Is it possible that the outer map (`the_map`) is doing something during `operator[]` that affects internal structures? Maybe rebalancing the RB-tree is triggered? *Should* this approach (locking based on `i`) work? – Kulluk007 Oct 11 '19 at 21:28
Yes, indeed, `operator []` inserts the key if it doesn't exist. You should not have an outer map anyway -- use an array or a vector. – Matt Timmermans Oct 11 '19 at 22:18
Yep the default map constructor is used to create an empty map as needed (when the key doesn't exist). My question has to do with something else (and finding the cause of the TSAN race). If two threads insert different values `i` and `j` into the outer map, is there a possibility of a race in the internal tree of the map? For example, Red Black tree is rebalanced using rotation (most likely) and couldn't that cause a data race in the above example, if this happens during the concurrent `i` and `j` insertions? This could happen with lock striping because they would use different locks. – Kulluk007 Oct 11 '19 at 23:56
Re: not using a map for the outer data structure -- I agree that maps should rarely be used -- I'm modifying an existing benchmark currently. But I think they use it because the domain is large and indexes used are sparse. The hashing to a small domain is why they probably use this. Is there a clever way to use an array here? – Kulluk007 Oct 12 '19 at 00:00
I think this question is relevant: https://stackoverflow.com/questions/27829806/can-different-threads-insert-into-a-map-if-they-always-use-different-keys I'm currently using a single outer map and lock striping on the key doesn't help given the rebalancing issue (and possibly other issues) I raised above. – Kulluk007 Oct 12 '19 at 00:14
1

Oh I thought the outer map was for your striping buckets. You should have 8 outer maps and 8 locks -- one pair for each bucket. Nothing about the maps should need to be thread-safe, because nothing should touch any map without holding its bucket's lock. – Matt Timmermans Oct 12 '19 at 01:23
Got it! I'll give that a try. Thanks. – Kulluk007 Oct 12 '19 at 19:20

Fine-grained locks on C++ map of maps

1 Answers1

Linked