Non-blocking semaphores in C++11?

Question

A number of questions on this site deal with the lack of a semaphore object in the multi-threading support introduced in C++11. Many people suggested implementing semaphores using mutexes or condition variables or a combination of both.

However, none of these approaches allows to increment and decrement a semaphore while guaranteeing that the calling thread is not blocked, since usually a lock must be acquired before reading the semaphore's value. The POSIX semaphore for instance has the functions sem_post() and sem_trywait(), both of which are non-blocking.

Is it possible to implement a non-blocking semaphore with the C++11 multi-threading support only? Or am I necessarily required to use an OS-dependent library for this? If so, why does the C++11 revision not include a semaphore object?

A similar question has not been answered in 3 years. (Note: I believe the question I am asking is much broader though, there are certainly other uses for a non-blocking semaphore object aside from a producer/consumer. If despite this someone believes my question is a duplicate, then please tell me how I can bring back attention to the old question since this is still an open issue.)

Why do you care whether the semaphore blocks or not? If you care because you think non-blocking semaphores perform better than blocking ones then this is an XY question. What you really want is a semaphore with the level of performance you think non-blocking semaphores have and you don't actually care how it's implemented. — David Schwartz, Feb 16 '17 at 09:22
I don't see a problem to implement a semaphore. Using C++11 atomics and mutextes it should be possible. I found a implementation here: http://stackoverflow.com/questions/4792449/c0x-has-no-semaphores-how-to-synchronize-threads — OutOfBound, Feb 16 '17 at 09:29
@OutOfBound If you suggest an implementation with non-blocking post() and trywait(), I will be happy to select it as accepted answer. — kassiopeia, Feb 16 '17 at 09:32
@DavidSchwartz I did not say that this is for performance, however I do believe that short blocking vs non-blocking might cause different behavior under certain circumstances, although I can not think of an example right now. Aside from that, it simply makes me wonder why C++11 does not include such a thing. — kassiopeia, Feb 16 '17 at 09:36
@kassiopeia, what "different behavior" apart from wasting time waiting for the lock are you talking about? — SingerOfTheFall, Feb 16 '17 at 09:37
C++11 does not include such a thing because it serves no purpose. You rejected the purpose I suggested and you admitted that you can't think of any other difference. You can't include everything and certainly things that nobody can think of any purpose at all for should be quite low on the priority list! — David Schwartz, Feb 16 '17 at 09:41
@SingerOfTheFall You have a common misunderstanding about locks. Rather than "wasting time waiting", locks allow contending threads to be de-scheduled so that non-contending threads can run. — David Schwartz, Feb 16 '17 at 09:41
@DavidSchwartz, I'm aware of that, but if you look at it from the point of view of the thread that is waiting, you can consider it "wasting time", or am I wrong? — SingerOfTheFall, Feb 16 '17 at 09:44
@SingerOfTheFall No, not at all. A thread is not "wasting time" when some other thread uses the CPU. It's blocked and allowing the system to do useful work. The worst case scenario is when you use a lock-less algorithm and it allows two contending thread to run concurrently, slowing the entire system to a crawl as they fight over cache lines. Threads are not at war, each working solely for themself. They all benefit when contending threads are not run concurrently and contention is minimized so they get done quickly and allow other tasks to get done at full speed too. — David Schwartz, Feb 16 '17 at 09:46
@DavidSchwartz So you agree there is a difference between blocking and non-blocking, although you claim non-blocking is not desirable. — kassiopeia, Feb 16 '17 at 09:50
Almost. My point is that the differences are very, very platform specific and there's no general rule that will say that you will want one or the other. You have to know a lot about the specifics of the platform and until and unless C++11 gets huge amounts of additional support, the implementations will be platform specific too. — David Schwartz, Feb 16 '17 at 09:58
@DavidSchwartz Ok, I understand. So basically C++11 does not include a semaphore (and it is not possible to implement a non-blocking one) because it is neither necessary nor desirable. Would you mind posting this as an answer, so I can accept it? — kassiopeia, Feb 16 '17 at 10:07
When using `sem_post` then a threads gets woken up if waiting. That means it must be in a blocked state. Asking for non-blocking semaphores is ... unusual. If you skip `sem_wait` and only use `sem_trywait` then it is very easy with atomics. — knivil, Feb 16 '17 at 10:34

OutOfBound · Accepted Answer · 2017-02-16T11:07:30.530

4

I don't see a problem to implement a semaphore. Using C++11 atomics and mutextes it should be possible.

class Semaphore
{
private:
    std::atomic<int> count_;

public:
    Semaphore() :
        count_(0) // Initialized as locked.
    {

    }
    void notify() {
        count_++;
    }

    void wait() {
        while(!try_wait()) {
            //Spin Locking
        }
    }

    bool try_wait() {
        int count = count_;
        if(count) {
            return count_.compare_exchange_strong(count, count - 1);
        } else {
            return false;
        }
    }
};

Here is a little example of the usage:

#include <iostream>
#include "Semaphore.hpp"
#include <thread>
#include <vector>

Semaphore sem;
int counter;

void run(int threadIdx) {
    while(!sem.try_wait()) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }
    //Alternative use wait
    //sem.wait()
    std::cout << "Thread " << threadIdx << " enter critical section" << std::endl;
    counter++;
    std::cout << "Thread " << threadIdx << " incresed counter to " << counter << std::endl;

    // Do work;
    std::this_thread::sleep_for(std::chrono::milliseconds(30));

    std::cout << "Thread " << threadIdx << " leave critical section" << std::endl;
    sem.notify();
}
int main() {
    std::vector<std::thread> threads;
    for(int i = 0; i < 15; i++) {
        threads.push_back(std::thread(run, i));
    }

    sem.notify();


    for(auto& t : threads) {
        t.join();
    }
    std::cout << "Terminate main." << std::endl;
    return 0;
}

Of course, the wait is a blocking operation. But notify and try_wait are both non-blocking, if the compare and exchange operation is non blocking (can be checked).

edited Feb 16 '17 at 11:07

answered Feb 16 '17 at 09:42

OutOfBound

1,914
14
31

3

It will work very, very poorly. Among the many problems with such an approach on typical modern CPUs, a thread spinning while waiting to acquire the semaphore will consume core microexecution resources. If the resource is held by another thread running in the same physical core, that critical thread will slow to a crawl. *Do not do this!* See my comments on that answer. – David Schwartz Feb 16 '17 at 09:43
Thanks for linking again to a thread I already linked to in my question and explained why these implementations are different from what I desire. – kassiopeia Feb 16 '17 at 09:45
@DavidSchwartz as for my understanding, there are no spin locks in place. It uses the condition variables to sleep as long, as the resource can not be allocated – OutOfBound Feb 16 '17 at 09:46
@OutOfBound If you mean that answer, it's not non-blocking since it blocks on the condition variables. The non-blocking one has all the problems I pointed out. – David Schwartz Feb 16 '17 at 09:48
@kassiopeia I get the question. – OutOfBound Feb 16 '17 at 09:52
@DavidSchwartz I also understand you comment, but there are scenarios, where performance doesn't matter, but the non-blocking property does. For example realtime programming. – OutOfBound Feb 16 '17 at 09:53
@OutOfBound That gets very platform specific. Yes, C++11 doesn't have any realtime programming facilities. I think you'll find all your examples are going to be platform specific because to be done portably they'd require huge swaths of functionality that C++11 doesn't have. – David Schwartz Feb 16 '17 at 09:57
@OutOfBound Yet in your answer you link again to a thread which I have already linked to in my question. I appreciate your efforts, but do you see that this is not very helpful? – kassiopeia Feb 16 '17 at 09:58
@DavidSchwartz what do you think about the implementation? – OutOfBound Feb 16 '17 at 10:22
@OutOfBound The OP asked for non-blocking semaphores. These block. – David Schwartz Feb 16 '17 at 10:44
@DavidSchwartz on what operation in notify and try_wait do they block? – OutOfBound Feb 16 '17 at 10:47
This loses notifications from the condition variable. – Maxim Egorushkin Feb 16 '17 at 10:52
@RustyX You are mistaken. Thread A: `try_wait` returns `false`, Thread B: `notify_one`, Thread A: `condition_.wait`. Result: the notification is missed. To avoid missing the notification the mutex must be held when calling `notify_one`, in this particular use case. – Maxim Egorushkin Feb 16 '17 at 10:57
@MaximEgorushkin you're right, I missed the `try_wait` part. I think [your solution](http://stackoverflow.com/a/4793662/485343) is the best that can be achieved. – rustyx Feb 16 '17 at 10:59
@OutOfBound The `wait` operation blocks when it could spin. – David Schwartz Feb 16 '17 at 11:03
I could add a blocking notify, which can't loose notifications. But i'm not sure if it would be wise, since its easy to confuse them and it could lead to very long locking. Would it be better to use spinning, as @DavidSchwartz mentioned? – OutOfBound Feb 16 '17 at 11:03
@OutOfBound Sure, make it spin. But not on my mobile phone, tablet, or notebook! – Maxim Egorushkin Feb 16 '17 at 11:09
@MaximEgorushkin Whether spinning is bad depends entirely on the expected waiting times and whether you have other threads that could make use of the CPU core while the thread is waiting. If you are typically spinning less than a microsecond on a system that's not overcommitted, spinning is actually the most performant and resource efficient waiting method - any method that involves the OS and its scheduler typically has latencies in the range of microseconds, latencies that you can avoid by spinning. Of course, always recheck these two conditions before actually using spinning. – cmaster - reinstate monica Feb 16 '17 at 11:28
@cmaster True. Here, however, the spinning is endless. – Maxim Egorushkin Feb 16 '17 at 11:29
@OutOfBound Now this is a surprisingly simple solution, with just one atomic integer... and truly non-blocking. – kassiopeia Feb 16 '17 at 11:31
4

The problem is, you can't make it spin portably without seriously compromising performance. Spinning properly has to, for example, take into account whether you're sharing a physical core with another thread and whether atomic compare-and-swap operations dirty cache lines when the comparison fails. You don't want a spinning thread to monopolize scarce CPU resources and slow *other* threads to a crawl. – David Schwartz Feb 16 '17 at 11:45
`wait` will block if `count <= 0`. – knivil Feb 17 '17 at 10:52
@knivil the point of the wait is to block anyway. But the try_wait and notify is non-blocking. – OutOfBound Feb 21 '17 at 10:40

Non-blocking semaphores in C++11?

1 Answers1