2

There are three different types of "lock-free" algorithms. The definitions given in Concurrency in Action are:

  1. Obstruction-Free: If all other threads are paused, then any given thread will complete its operation in a bounded number of steps.
  2. Lock-Free: If multiple threads are operating on a data structure, then after a bounded number of steps one of them will complete its operation.
  3. Wait-Free: Every thread operating on a data structure will complete its operation in a bounded number of steps, even if other threads are also operating on the data structure.

I asked in a previous question why code using mutexes don't satisfy definition (2) above. The answer is that lock-free algorithms must also be non-blocking.

To be lock-free, the algorithm must be "non-blocking" in addition to one of the three conditions above. An algorithm is called non-blocking if failure or suspension of one thread does not affect failure or suspension of another thread. (Wiki)

This seemed like a satisfactory answer. However, in this talk, Herb Sutter presents what is purportedly a lock-free producer consumer algorithm. Here is my summary of the algorithm:

There are N+1 threads. 1 thread is a producer and N threads are consumers. There is an array of length N visible to all threads. The producer puts tasks in empty entries of the array. The i-th consumer takes out and performs the task in the i-th entry of the array, leaving the entry empty.

Here is my problem: this producer-consumer algorithm fails to be non-blocking. If all of the consumers are suspended, then the array fills up with tasks, and the producer cannot make progress. Herb Sutter addresses the question of whether or not it is lock-free at time stamp 45:19. About the producer, he says:

The threads can get jammed. It may wait if all of the mailboxes are full. It is wait-free up to N active workers (consumers)... Beyond N active workers... so if all the slots are full, then it's no longer wait-free, it's just lock-free. So overall, it is just lock-free.

However, this does not explain why this producer consumer algorithm is non-blocking, which a condition for being lock free.

Is Herb Sutter's producer-consumer algorithm lock-free? Is it non-blocking? How is this consistent with the definition of lock-free I gave: (i) satisfy the non-blocking condition and (ii) satisfy one of (1), (2), or (3) above.

Edit: Below is the portion of the producer's code that contains what I believe to be a "spinlock" at while (slot[curr] != null):

curr = 0;
// Phase 1: Build and distribute tasks
while (ThereAreMoreTasks()){
    task = AllocateAndBuildNewTask();
    while (slot[curr] != null)
        curr = (curr + 1)%K;
    slot[curr] = task;
    sem[curr].signal();
}
// Phase 2
// ...
Mark Wallace
  • 528
  • 2
  • 12
  • 1
    Whether it's lock-free depends on implementation details that you omitted, and I'm not going to watch the entire talk to find out. Is every individual array element protected by a mutex? Then it's not lock-free, because death of a thread while it's holding a mutex may cause infinite waiting in other threads. Is there some atomic compare-and-swap operation being used to put and take array entries? Then it is probably lock-free. – Thomas May 28 '22 at 20:08
  • Spitballing here, but I wouldn't be surprised if, within the context of that talk, "Lock-Free" was actually shorthand for "Lock-Free during normal operation". –  May 28 '22 at 20:11
  • @Thomas each individual array element will only be read and written to by the producer and one consumer. Each array element is modified atomically. There are no locks involved. The "blocking" only occurs if the array fills up with tasks. The producer is "blocked" as it cycles through the array looking for an empty entry. – Mark Wallace May 28 '22 at 20:12
  • 1
    @Frank Herb Sutter addressed that with the quote starting with "The threads can get jammed..." He said that it is "wait-free" during normal operation and "lock-free" if we allow the array to fill up. – Mark Wallace May 28 '22 at 20:14
  • @Thomas if you believe the code will add to the question, I will add it. My feeling was that the code would detract from the question. – Mark Wallace May 28 '22 at 20:15
  • It's hard to decide whether some algorithm actually is lock-free. For example there might be a hidden spin lock making it equivalent to a locked solution. Someone would have to deeply analyse Herb Sutter's algorithm to answer your question. But typically solutions that use some primitives like Compare-And-Swap or Fetch-And-Add operations (instead of acquire/release lock) can be assumed to be lock-free. Either way it is not trivial. – freakish May 28 '22 at 20:17
  • @freakish I added the producer code. What I believe is the spinlock is at the while-loop ```while (slot[curr] != null)```. – Mark Wallace May 28 '22 at 20:23
  • @MarkWallace that's not a spin lock. It does spin, potentially infinitely many times, but it is not a lock. Again: if that thread fails, it doesn't affect other threads spinning in the same loop. Locks (almost by definition) don't have this property. If a thread fails after acquiring a lock, but before releasing it, it will deadlock entire system. – freakish May 28 '22 at 20:25
  • 1
    Although, now that I think of it, this snippet is not enough to decide. It's getting late (where I am at least), but I promise I will watch the video tomorrow and try to help you out. – freakish May 28 '22 at 20:28
  • The assumption here is that other threads get to run and the system isn't just letting the producer run forever. As soon as any other thread runs there is progress. If you are on a single core system with cooperative multitasking the code will just get stuck in the producer forever. As soon as you have 2 cores or preemptive multitasking it will always make progress. What is missing in the producer is a `yield` or `pause` command when all slots are full. – Goswin von Brederlow May 28 '22 at 20:32
  • @GoswinvonBrederlow couldn't the same be said for mutex-based code such as [this](https://stackoverflow.com/questions/72415993/the-definition-of-lock-free?noredirect=1)? "As soon as you have 2 cores or preemptive multitasking it will always make progress." – Mark Wallace May 28 '22 at 20:37
  • Conversely all the consumers will block after the producer terminates and each array element is consumed. There is no way to make a buffer that provides data when no data is available. – Jim Rogers May 28 '22 at 21:57
  • The general idea of "lock-free until/unless the buffer fills" up is very similar to [Lock-free Progress Guarantees in a circular buffer queue](https://stackoverflow.com/q/45907210). Herb's queue is a SPMC (single producer, multiple consumer). It doesn't meet the formal CS definition, but is useful in practice because it normally doesn't actually block and performs well. When operating normally, without any pauses that are *too* long, you do get non-blocking operation with low contention. The case you're asking about is the queue-full case (or old-reader-stuck case). – Peter Cordes May 28 '22 at 22:09
  • I think this is a duplicate, in fact; the answers there do explain that it's not technically/formally lock-free, but acts that way in practice without any of the cost of making it lock-free even on weird hardware and/or scheduling corner cases. If that's not an answer to your question, maybe I'm missing something about what you're asking. – Peter Cordes May 28 '22 at 22:13
  • @JimRogers A solution would be (to satisfy the definition of lock-free) that the consumers should start performing the function of a producer (or terminating) once no more tasks were available. This would be a reasonable implementation with a stack or queue-based lock-free data structure. However, I do not know if this is typically how it is understood in the literature. – Mark Wallace May 28 '22 at 22:16
  • If there's no data to read and no producer is even trying to write any, that's not the queue implementation's fault; there's nothing it could do about that. That shouldn't disqualify the queue algorithm from being a non-blocking algorithm. Only cases of lack of progress when it was hypothetically possible should count. But I'm not familiar enough with formal CS study of the subject either to know if they take that sensible approach. (Having consumers turn around and write data just for the sake of doing something to create throughput doesn't seem a realistic suggestion.) – Peter Cordes May 29 '22 at 02:44

0 Answers0