Ring buffer for a wait-free producer and a blocking consumer

Question

I have a producer thread that produces objects at a rate that might (temporarily) be too fast for the consumer thread to consume. Therefore, I'd like a FIFO that buffers the remaining work. Once the FIFO is full, the producer shall simply retire or retry later. Also, I would like to be able to notify the consumer that no more work is to be done (without enqueuing a special object into the FIFO). The producer shall not get slowed down and the consumer shall not waste CPU cycles, so I have the following requirements:

Circular ring buffer of fixed size.
Producer shall never wait: It needs to be able to enqueue elements as quickly as possible.
Consumer shall be able to block: No busy-waiting for a potentially slow producer. A hybrid lock would be nice.

I am envisioning the following C++ class:

template <typename T, std::size_t N>
class spsc_circular_half_blocking {
    std::array<T, N> buffer;
    // bookkeeping, atomics, mutexes etc. go here
public:
    bool try_push(const T&); // returns whether object could be enqueued
    void notify(); // notifies consumer
    bool wait_pop(T&); // returns whether object was dequeued or notification was received
};

Being able to modify the elements in-place would be nice. It may also use dynamic allocation of the buffer (e.g. size is passed to constructor, buffer is a unique_ptr).

Now for my questions. Is this thing even possible (on x86, at least)?

If yes, how would it work? I would really like some implementation, preferably but not necessarily C++.
If no, why not?

Pointers to related material, even if it does not exactly fulfill my needs, would also be greatly appreciated.

Look at https://github.com/cameron314/readerwriterqueue . It has a blocking version. — rsjaffe, Aug 09 '18 at 14:37
I haven't seen any papers about lock-free circular buffers so it is very likely that such are not yet invented. Lock-free queues have been around for more than two decades. Anyway asking for suggesting off-siteresources is off topic here. — Öö Tiib, Aug 09 '18 at 15:06
Did you benchmark a blocking version with a mutex and a condition variable and found out that it was not fast enough? — Maxim Egorushkin, Aug 09 '18 at 15:13
@Maxim I have a poorly performing solution involving two wait-free queues in a very contrived way which I am trying to improve on. I really need some form of `try_push` because in my case, it is not sensible to continue operation once the queue limit is reached. The producer never having to wait for the consumer is a strict requirement. — purefanatic, Aug 10 '18 at 07:26
@ÖöTiib I asked for off-site material because I don't think that all the necessary concepts and implementation details involving a solution for my problem would fit into a single answer here on SO. But I do not think this is very problematic: Instead of giving concrete resources it would be possible to point me at the right keywords and explain how to use some specific concept instead (so that I can more easily find an existing solution on my own). — purefanatic, Aug 10 '18 at 07:31
Designing synchronization mechanisms from scratch isn't easy, after all, that's why I would like to leave this task to someone else. — purefanatic, Aug 10 '18 at 07:33

Maxim Egorushkin · Accepted Answer · 2019-02-20T10:13:01.387

4

One solution would be to use boost single-producer single-consumer queue with you own signalling and blocking.

edited Feb 20 '19 at 10:13

answered Aug 09 '18 at 15:21

Maxim Egorushkin

131,725
17
180
271

This seems like a very useful approach! But I fail to see why I would have to check `read_available() == 0` at the producer site. Wouldn't it be sufficient to notify the consumer in either case? Also, wouldn't calling `read_available()` from the producer site be subject to race conditions? The manual does seem to recommend against using `read_available()` in the producer. – purefanatic Aug 10 '18 at 07:39
@purefanatic I revised the answer and added a working solution for you. – Maxim Egorushkin Aug 10 '18 at 09:52
I think I already got the idea, your example is way too generous! Just wanted to clarify if I understood you correctly. But I have to point out that there's still a race condition ;-) When the producer checks whether the state is blocked right before the consumer does `state_.store(BLOCKED)`, a notification will be lost. The solution is to always protect the condition variable state with the mutex: The mutex has to be locked unconditionally in the producer and the state can not be checked beforehand. An atomic is not sufficient (and unnecessary, as we already have the mutex). – purefanatic Aug 10 '18 at 12:54
@purefanatic I am sorry, but your analysis is incorrect. There is no race condition. The only requirement to avoid the race condition is to only change the state of `state_` while mutex is held. Notifying the condition while not holding the mutex does not cause a race condition. – Maxim Egorushkin Aug 10 '18 at 13:36
_When the producer checks whether the state is blocked right before the consumer does `state_.store(BLOCKED)`, a notification will be lost._ - in this case the producer does not notify the consumer because the latter has not blocked yet. – Maxim Egorushkin Aug 10 '18 at 13:44
@purefanatic However, after the consumer did `queue_.pop` and it was empty, the producer can do `queue_.push` and not notify the consumer because the latter has not blocked yet. Hence, the consumer may block while having 1 element available in the queue. There must be a way to fix that... – Maxim Egorushkin Aug 10 '18 at 13:49
@purefanatic Fixed that now. – Maxim Egorushkin Aug 10 '18 at 14:01
I think you can optimistically check for `UNBLOCKED == state_` *before* taking the lock in `pop`. (You still need to check `UNBLOCKED != state_` after taking the lock, but this saves a lock/unlock in the fast path.) – Peter Cordes Aug 11 '18 at 04:41
@PeterCordes It needs to hold the lock when `UNBLOCK != state_.load(std::memory_order`. So that either consumer blocks first or the producer does `state_.store(UNBLOCK, std::memory_order_relaxed);` and then the consumer does not block. If the consumer does not lock there the producer can do `state_.store(UNBLOCK, std::memory_order_relaxed);` and `c_.notify_one();` after that check but before `state_.store(BLOCKED, std::memory_order_relaxed);` and that would be a race resulting in a missed notification and the consumer can block forever. – Maxim Egorushkin Aug 11 '18 at 14:35
@Maxim _Hence, the consumer may block while having 1 element available in the queue._ Well that's _exactly_ the race condition I tried to describe! But you seem to be right in that my analysis isn't quite correct because your fix seems to work. Thanks again! – purefanatic Aug 11 '18 at 17:56
Oh yes, I missed that `state_.store(READY,mo_relaxed)` was done with the lock held, even if the if() was false. And also that's not the fast-path, that's only in the queue-empty case. Anyway, I wondered if we could work around it with `xchg` to atomically check the old value + store, but that can't exactly match the semantics of doing the `if()` body's alternate store. I think a `compare_exchange_strong(UNBLOCK, READY)` would work, but that only helps us for that tiny time window where we *just* missed a queue entry but got the `UNBLOCK`. Or might speed leaving the loop on seeing it. – Peter Cordes Aug 11 '18 at 20:44
1

@PeterCordes I quite like your idea of `compare_exchange_strong(UNBLOCK, READY)`. I think the case of just missing the queue entry while getting `UNBLOCK` might not occur often, however, it handles that case neatly. The store of `READY` does not need to be done under lock because the producer does not care or wait for this particular state. Updated the solution. – Maxim Egorushkin Aug 12 '18 at 00:47
Hmm, can't this deadlock now? If we spin on `while(UNBLOCK != state_.load(R)){}` *with the lock held*, the producer can't store a new value. (I haven't fully thought this through, I might well be missing something that makes that not a problem.) And BTW, I'd suggest something longer than `R`, which could also stand for `Release`. Maybe `mo_relaxed`, which I often use as shorthand in pseudocode. – Peter Cordes Aug 12 '18 at 00:52
@purefanatic: would it be ok to solve the missed-notification problem by having the producer check again to make sure the consumer is unblocked if the queue fills up? So if a race means the notification is lost when the queue goes from 0 to 1, we recover when the queue is full. That could probably get rid of the mutex altogether. – Peter Cordes Aug 12 '18 at 00:55
1

@PeterCordes Waiting on a condition variable releases the mutex. – Maxim Egorushkin Aug 13 '18 at 09:19
@PeterCordes That issue was solved with `queue_.write_available() == Size - 1` check. – Maxim Egorushkin Aug 13 '18 at 09:25
@MaximEgorushkin: I was looking for a way to get rid of the mutex, because I thought there was a problem. With no mutex and *just* a condition variable, we would sometimes have that problem. I hadn't used condition variables before, and didn't notice it unlocked the mutex before it starts waiting. It seems like overkill to have a lockless queue, *and* a mutex, *and* a condition variable, all between these two threads. So it might be be a tradeoff between maybe leaving a blocked state faster most of the time, but sometimes missing a wakeup and thus being much slower. – Peter Cordes Aug 13 '18 at 09:31
@PeterCordes Waiting on condition variable requires the mutex to be locked. The mutex and condition variables are only uses when the consumer drained the queue and sleeps and needs to be woken up. In other cases the queue is wait-free. – Maxim Egorushkin Aug 13 '18 at 09:32
This code has a race condition. All you need to do is bombard it with elements and it quite reliably deadlocks waiting for the condition. – goji Feb 19 '19 at 13:14
@goji I don't think it does. Post your test code that demonstrates the issue. – Maxim Egorushkin Feb 19 '19 at 13:36
I managed to solve this problem using this eventcount algorithm: https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/295834 – goji Feb 20 '19 at 01:21
2

Why did you delete all of your code here? I previously found it to be useful. – Cody Gray - on strike Jan 15 '20 at 22:00

score 0 · Answer 2 · answered May 22 '21 at 06:10

0

Google found this, no idea about it but worth examining perhaps:

Wait-free multiproducer multiconsumer ring buffer:

https://www.osti.gov/servlets/purl/1531271

answered May 22 '21 at 06:10

Yttrill

4,725
1
20
29

1

Link-only answers are normally discouraged, although the whole question asks for so much that it's basically a resource-request. Still, normally just posting a link should be done with a comment, not an answer. *Especially* if you just found randomly Googled it and don't have anything to say about it yourself. – Peter Cordes May 23 '21 at 05:41

Ring buffer for a wait-free producer and a blocking consumer

2 Answers2