I could implement my C++20 atomic wait based locking mechanism two ways:
- With the first implementation I would have one or two atomic descriptors that control if worker threads can proceed or not. The huge drawback of this implementation is that I need to wake up basically all waiting threads with
notify_all
, but only one can proceed at a time and the others need to sleep back. - I could implement the locking with task level atomics as well. With this implementation however I would have thousands of atomics, but they would be seldom used. Maximum hardware thread count of those could be in atomic waiting state in any given time and each of them could be precisely waken up by a
notify_one
. The count of the atomic operations would be in the same magnitude with both implementations, but with the later one the operations would be performed on many different atomic variables.
Does the second approach have any show-stopper drawbacks that can degrade the performance to be worse than the implementation with the first approach? We can assume that std::atomic<T>::is_always_lock_free
is true for our types.