Why is this Swift Readers-Writers code causing deadlock?

Question

I seem to have a classic solution for Readers-Writers problem in Swift using concurrent DispatchQueue with a barrier for writes. However, then I run my test code, it deadlocks.

Here's my wanna-be-thread-safe data structure:

class Container<T> {
    private var _value: T
    private let _queue = DispatchQueue(label: "containerQueue", attributes: .concurrent)
    
    init(_ value: T) {
        self._value = value
    }
    
    var value: T {
        get {
            _queue.sync {
                _value
            }
        }
        set {
            _queue.async(flags: .barrier) {
                self._value = newValue
            }
        }
    }
}

And here's my test code:

class ContainerTest {
    let testQueue = DispatchQueue(label: "testQueue", attributes: .concurrent)
    var container = Container(0)
    let group = DispatchGroup()
    
    func runTest() {
        for i in 0..<1000 {
            testQueue.async(group: group) {
                self.container.value = max(i, self.container.value)
            }
        }
        
        group.notify(queue: .main) {
            print("Finished")
        }
    }
}

The piece of code that's run repeatedly is just some random read and write operations. It's not attempting to produce anything sensible, it's just there to stress-test the data structure.

So when I run this, "Finished" is never printed. However, if I change _queue.async(flags: .barrier) to _queue.sync(flags: .barrier), then I see "Finished" printed.

I'm guessing when I'm using the async write version, I'm getting a deadlock, but why? It's the textbook Readers-Writers solution that's typically used. Perhaps it's my test code that is at fault, but again, why?

The reader-writer pattern falls apart in thread-explosion scenarios, where you can easily exhaust the worker thread pool. I’ve abandoned reader-writer because, despite its theoretical attractiveness, it is only modestly faster that a serial queue and introduces the sorts of problems you have identified. And where synchronization performance is of paramount concern (which, as an aside, is a red flag in of itself), locks (`NSLock`, unfair locks, etc.) are much faster. In short, I find reader-writer to be an academically interesting pattern that is not well suited to the constraints of GCD. — Rob, Jun 13 '23 at 14:12

score 3 · Answer 1 · answered Jun 12 '23 at 20:20

Rob in his answer gave me some great clues into why my code was not completing: "thread pool exhaustion, not threads available, everything locks up". So I wanted to see exactly why does everything lock up when pool exhaustion happens.

Having learned a bit about thread pools, I would normally expect pool exhaustion to be perhaps bad for performance, but not necessarily a reason for a deadlock, because once some blocks finish executing, they would free up their threads, and allow other blocks to execute. So I didn't see immediately how can this result in a deadlock.

But alas, here's one way this could have resulted in a deadlock:

In the image above I'm trying to depict what the state of my queues was at the time when deadlock happened.

The main queue is "fine", because it dispatched everything asynchronously to the "testQueue".

Now, I just learned about the GCD thread pool – it's a pool of available threads, and GCD can add more threads to it when needed, but there's a limit, which probably depends on the platform and device, but for the sake of example let's say it's max 4 threads.

So in this case, I dispatched 1000 concurrent blocks, and the first 4 got their own thread to execute on, immediately depleting the GCD thread pool.

The remaining 996 blocks have to wait for threads to free up before they can run. This is all fine so far.

Next, the first block that received a thread starts executing. In its code it calls _queue.sync to read the value on the "containerQueue". To run a block on "containerQueue" it needs its own thread, but because our GCD thread pool is empty, it can't run and has to wait for threads to become available.

Now because the block in "containerQueue" was dispatched synchronously, it blocks the "testQueue" block from completing. Now since it can't complete, it can't release its thread back to the GCD thread pool.

So "testQueue" blocks never complete, never release their threads, and "containerQueue" blocks never start because they wait forever for threads to free up.

This is a deadlock.

Now if I pause my program execution and look at my threads, I see a whole bunch of "testQueue" threads, all waiting on a lock, after trying to access Container.value.getter, which seems to support my hypothesis above.

However, this doesn't explain why the deadlock goes away when I change the setter from async to sync. Maybe it's just "luck".

The reason it works for `sync` is that the system can usually use the calling thread to process the block. It doesn't have to create a new thread like `async` does, since it knows the calling thread can't do anything until the block returns anyway. This is one of those subtle ways that queues are not threads. (Excellent write-up, btw.) — Rob Napier, Jun 12 '23 at 20:27
Thanks, that's a good tip to keep in mind (synchronously dispatched blocks may execute on calling thread). So using sync in the setter potentially reduced the number of threads used, thus never depleting the GCD thread pool enough to create a deadlock situation. So having changed setter to sync, I still tried to create a deadlock by changing the number of async dispatches on testQueue from 1000 to something like 1000000, but while it took some time, it still completed without a deadlock. So not sure that my hypothesis above explains exactly what happened. But maybe it's good enough for now. — Alex, Jun 12 '23 at 20:39
Also, in my hypothesis above, I say that running _queue.sync needs its own thread, when threads are not available, thus causing a deadlock. But if (as per above comment) it were to run on the caller thread, then deadlock would not happen. — Alex, Jun 12 '23 at 21:02

Rob Napier · Accepted Answer · 2023-06-12T14:16:09.197

2

You're almost certainly getting thread-pool exhaustion. This is a pattern Apple engineers repeatedly warn against using because of that risk (I'm aware it's also documented as a common pattern in many places). Each call to .async is spawning a thread, and if there is severe contention (as you're creating), eventually there will be no threads available, and everything will lock up.

In modern Swift, the correct tool for this is an actor. In pre-concurrency Swift, you'd likely want to use an OSAllocatedUnfairLock, though when you find yourself making an atomic property, thinking you mean thread-safe, you are often on the wrong road.

edited Jun 12 '23 at 14:16

answered Jun 12 '23 at 14:11

Rob Napier

286,113
34
456
610

Thanks for the answer Rob! That makes sense. If I were to make a general purpose "thread-safe" value container like that, and have no control over how engineers are going to use it, what would be the recommended approach given the thread-pool exhaustion risk? Should I use a serial queue? Or concurrent queue with sync for writes? – Alex Jun 12 '23 at 14:16
You should use an `actor`. – Rob Napier Jun 12 '23 at 14:16
Any downsides to using actors? I've seen somewhere some performance benchmarks that show them being significantly slower than using locks, for example. – Alex Jun 12 '23 at 14:18
I'm also genuinely curious what did people do before actors. On my path to knowledge, is my only option to look into how actors are implemented in Swift, so I understand it? – Alex Jun 12 '23 at 14:23
1

You were happy to spawn a thread for each access and introduce a thread-context switch.... I didn't think performance was your concern. Yes, I would in some cases expect `actor` to be slower than carefully hand-built solutions with atomics. Do you know how to use `Atomics` correctly? They're quite tricky and when you mess them up, you just get bizarre behaviors, not errors. – Rob Napier Jun 12 '23 at 14:25
Generally the correct approach pre-actors was to hand all of the required data to a dispatched task, and then collect all of the result at the end. If you're constantly messing with a lock to access data across threads, you're going to waste a lot of time on contention. If you must deal with contention and can accept an unfair lock, then OSAllocatedUnfairLock has become the easiest one to use correctly. – Rob Napier Jun 12 '23 at 14:27
1

When I say "the correct approach," I mean just that. I don't mean it's the common approach. Many, many people use the pattern you've shown here (I've used it). It generally relies on the fact that iOS apps rarely have much actual contention, so the thread explosion rarely happens. A serial queue is more correct, but has its own trade-offs. There is no universal answer; it depends on how the structure is used and what kind of contention it will have and how you want to deal with that contention. – Rob Napier Jun 12 '23 at 14:29
1

Thanks Rob for digging deeper and helping me understand it, I really appreciate it! Will use actors in production code, and will learn more about thread-pool exhaustion, contention, OSAllocatedUnfairLock usage, and how actors actually work. Maybe peek into atomics... I'm happy that we have good high-level APIs, but I just have an irresistible urge to understand the inner workings. Thanks again! – Alex Jun 12 '23 at 14:34

Why is this Swift Readers-Writers code causing deadlock?

2 Answers2