One heavily interacted atomic lock or many seldom used

Question

I could implement my C++20 atomic wait based locking mechanism two ways:

With the first implementation I would have one or two atomic descriptors that control if worker threads can proceed or not. The huge drawback of this implementation is that I need to wake up basically all waiting threads with notify_all, but only one can proceed at a time and the others need to sleep back.
I could implement the locking with task level atomics as well. With this implementation however I would have thousands of atomics, but they would be seldom used. Maximum hardware thread count of those could be in atomic waiting state in any given time and each of them could be precisely waken up by a notify_one. The count of the atomic operations would be in the same magnitude with both implementations, but with the later one the operations would be performed on many different atomic variables.

Does the second approach have any show-stopper drawbacks that can degrade the performance to be worse than the implementation with the first approach? We can assume that std::atomic<T>::is_always_lock_free is true for our types.

Memory overhead for non fundamental types and the risk of deadlocks when you need to perform multiple operations and therefor acquire multiple locks. For fundamental types slower access when you need multiple variables because each variable needs atomic load/store instead of just one lock for the whole class. Overall it just depends on the use case. — Goswin von Brederlow, Jul 15 '22 at 06:29
Note: using specific memory models on load/store can greatly reduce the overhead when using multiple atomics but it's hard to get right. — Goswin von Brederlow, Jul 15 '22 at 06:33
Separate pairs of threads can be interacting in parallel via atomic objects in separate cache lines. In (1), if your few atomics are used often enough that there's often contention for that cache line, that's bad. And extra wakeups is worse than an occasional cache miss. (2) seems obviously better if you don't end up needing way more system calls to wake multiple threads. If `notify_all` is a drawback not an advantage, don't create contention when you don't have to. — Peter Cordes, Jul 15 '22 at 19:10
IDK if it's applicable to your use-case, but an interesting compromise between separate-locks-for-everything vs. serializing everything is a hash table of locks (like [Where is the lock for a std::atomic?](https://stackoverflow.com/q/50298358) for non-lock-free objects). If any of your use-cases don't care whether they share an atomic object with something else or not, that would be possible. (But probably not; that's usually on the the case for locks, not other uses of atomics.) — Peter Cordes, Jul 15 '22 at 19:12

One heavily interacted atomic lock or many seldom used

0 Answers0