I have a lock-free multi producer, single consumer queue, based on a circular buffer. So far, it only has non-blocking push_back()
and pop_front()
calls. Now I want to add blocking versions of those calls, but I want to minimize the impact this has on the performance of code that uses the non-blocking versions - namely, it should not turn them into "lock-by-default" calls.
E.g. the simplest version of a blocking push_back() would look like this:
void push_back_Blocking(const T& pkg) {
if (!push_back(pkg)) {
unique_lock<mutex> ul(mux);
while (!push_back(pkg)) {
cv_notFull.wait(ul);
}
}
}
but unfortunately this would also require to put the following block at the end of the "non-blocking" pop_front()
:
{
std::lock_guard<mutex> lg(mux);
cv_notFull.notify_all();
}
While the notify
alone has hardly any performance impact (if no thread is waiting), the lock has.
So my question is:
How can I (using standard c++14 if possible) add blocking push_back
and pop_front
member functions to my queue without severely impeding the performance of the non_blocking counterparts (read: minimize system calls)? At least as long as no thread is actually blocked - but ideally even then.
For reference, my current version looks similar to this (I left out debug checks, data alignment and explicit memory orderings):
template<class T, size_t N>
class MPSC_queue {
using INDEX_TYPE = unsigned long;
struct Idx {
INDEX_TYPE idx;
INDEX_TYPE version_cnt;
};
enum class SlotState {
EMPTY,
FILLED
};
struct Slot {
Slot() = default;
std::atomic<SlotState> state= SlotState::EMPTY;
T data{};
};
struct Buffer_t {
std::array<Slot, N> data{};
Buffer_t() {
data.fill(Slot{ SlotState::EMPTY, T{} });
}
Slot& operator[](Idx idx) {
return this->operator[](idx.idx);
}
Slot& operator[](INDEX_TYPE idx) {
return data[idx];
}
};
Buffer_t buffer;
std::atomic<Idx> head{};
std::atomic<INDEX_TYPE> tail=0;
INDEX_TYPE next(INDEX_TYPE old) { return (old + 1) % N; }
Idx next(Idx old) {
old.idx = next(old.idx);
old.version_cnt++;
return old;
}
public:
bool push_back(const T& val) {
auto tHead = head.load();
Idx wrtIdx;
do {
wrtIdx = next(tHead);
if (wrtIdx.idx == tail) {
return false;
}
} while (!head.compare_exchange_strong(tHead, wrtIdx));
buffer[wrtIdx].data = val;
buffer[wrtIdx].state = SlotState::FILLED;
return true;
}
bool pop_front(T& val) {
auto rIdx = next(tail);
if (buffer[rIdx].state != SlotState::FILLED) {
return false;
}
val = buffer[rIdx].data;
buffer[rIdx].state = SlotState::EMPTY;
tail = rIdx;
return true;
}
};
Related questions:
I asked a similar question specificly about optimizing the usage of condition_variable::notify
here, but the question got closed as a supposedly duplicate of this question.
I disagree, because that question was about why the mutex is needed for condition variables in general (or rather it's pthread equivalent) - focusing on condition_variable::wait
- and not if/how it can be avoided for the notify
part. But apparently I didn't make that sufficiently clear (or people just disagreed with my opinion).
In any case, the answers in the linked question did not help me and as this was somewhat of an XY-problem anyway, I decided to ask another question about the actual problem I have and thus allow a wider range of possible solutions (maybe there is a way to avoid condition variables altogether).
This question is also very similar, but
- It is about C on linux and the answers use platform specific constructs (pthreads and futexes)
- The author there asked for efficent blocking calls, but no non-blocking ones at all. I on the other hand don't care too much about the efficiency of the blocking ones but want to keep the non-blocking ones as fast as possible.