1

I was looking at this answer that provides code for a thread safe array with concurrent reads. As @tombardey points out in the comments the code (relevant snippet below) is not completely safe:

public func removeAtIndex(index: Int) {

    self.accessQueue.async(flags:.barrier) {
        self.array.remove(at: index)
    }
}

public var count: Int {
    var count = 0

    self.accessQueue.sync {
        count = self.array.count
    }

    return count
}

...Say the sychronized array has one element, wouldn't this fail? if synchronizedArray.count == 1 { synchronizedArray.remove(at: 0) } It's a race condition, say two threads execute the statement. Both read a count of 1 concurrently, both enqueue a write block concurrently. The write blocks execute sequentially, the second one will fail... (cont.)

@Rob replies:

@tombardey - You are absolutely right that this level of synchronization (at the property/method level) is frequently insufficient to achieve true thread-safety in broader applications. Your example is easily solved (by adding an method that dispatches block to the queue), but there are others that aren't (e.g. "synchronized" array simultaneously used by a UITableViewDataSource and mutated by some background operation). In those cases, you have to implement your own higher-level synchronization. But the above technique is nonetheless very useful in certain, highly constrained situations.

I am struggling to work out what @Rob means by "Your example is easily solved (by adding an method that dispatches block to the queue)". I would be interested to see an example implementation of this method (or any other) technique to solve the problem.

asj123
  • 17
  • 5
  • FWIW, this is [example of how Apple has implemented their thread safe array](https://github.com/apple/swift-package-manager/blob/b2342f18309258fd7ad1e97ba77765489ec947e4/Sources/Basics/ConcurrencyHelpers.swift#L70). – Rob Dec 23 '20 at 04:27
  • @Rob do you think Apple chose to block the caller for the write operation itself because using a queue presumably blocks the caller for longer in practice than a lock for cheap operations (such as append)? Or do you think there might be another reason? Interesting they did this so thanks for directing me to it – asj123 Dec 25 '20 at 18:25
  • Yep, they’re using locks. So they can’t write asynchronously (nor can they read concurrent with respect to other reads), but it’s faster overall, nonetheless. – Rob Dec 25 '20 at 20:00

3 Answers3

4

This is a very good example of why "atomic" mutable operations on individual properties are rarely sufficient, and are dangerous to add without a great deal of care.

The fundamental problem in this example is that any time the array is modified, it invalidates existing indices. In order to safely use an index, you must ensure that the entire "fetch an index, use the index" operation is atomic. You can't just ensure that each piece is atomic. There is no safe way to write removeAtIndex in isolation, because there is no safe way to acquire the index you pass. Between the time you fetch the index, and the time you use it, the array may have been changed in arbitrary ways.

The point is that there's no such thing as a "thread-safe (mutable) array" that you can use just like a normal array and not have to worry about concurrency issues. A "thread-safe" mutable array cannot return or accept indices, because its indices aren't stable. Exactly what data structure is appropriate depends on the problem you're trying to solve. There's no one answer here.

In most cases the answer is "less concurrency." Rather than trying to manage concurrent access to individual data structures, think about larger-scoped "units of work" that carry all their own data and have exclusive access to it. Put those larger units of work onto queues. (In many cases, even this is overkill. You'd be shocked how often adding currency makes things slower if you don't design it very carefully.) For more recommendations, see Modernizing Grand Central Dispatch Usage.

Rob Napier
  • 286,113
  • 34
  • 456
  • 610
  • This makes sense that using indices in a "thread safe" array makes no sense, but before I accept this answer just want to check the meaning of "can't" accept indices: could the answer Kiril S provided still lead to an index out of range error? or by "can't" do you mean just in terms of practical usage? – asj123 Dec 12 '20 at 23:26
2

You said:

I am struggling to work out what @Rob means by “Your example is easily solved (by adding [a] method that dispatches block to the queue)”. I would be interested to see an example implementation of this method (or any other) technique to solve the problem.

Let’s expand upon the example that I posted in response to your other question (see point 3 in this answer), adding a few more Array methods:

class SynchronizedArray<T> {
    private var array: [T]
    private let accessQueue = DispatchQueue(label: "com.domain.app.reader-writer", attributes: .concurrent)

    init(_ array: [T] = []) {
        self.array = array
    }

    subscript(index: Int) -> T {
        get { reader { $0[index] } }
        set { writer { $0[index] = newValue } }
    }

    var count: Int {
        reader { $0.count }
    }

    func append(newElement: T) {
        writer { $0.append(newElement) }
    }

    func remove(at index: Int) {
        writer { $0.remove(at: index) }
    }

    func reader<U>(_ block: ([T]) throws -> U) rethrows -> U {
        try accessQueue.sync { try block(array) }
    }

    func writer(_ block: @escaping (inout [T]) -> Void) {
        accessQueue.async(flags: .barrier) { block(&self.array) }
    }
}

So, let’s imagine that you wanted to delete an item if there was only one item in the array. Consider:

let numbers = SynchronizedArray([42])

...

if numbers.count == 1 { 
    numbers.remove(at: 0) 
}

That looks innocent enough, but it is not thread-safe. You could have a race condition if other threads are inserting or removing values. E.g., if some other thread appended a value between the time you tested the count and when you went to remove the value.

You can fix that by wrapping the whole operation (the if test and the consequent removal) in a single block that is synchronized. Thus you could:

numbers.writer { array in
    if array.count == 1 { 
        array.remove(at: 0) 
    } 
}

This writer method (in this reader-writer-based synchronization) is an example of what I meant by a “method that dispatches block to the queue”.


Now, clearly, you could also give your SynchronizedArray its own method that did this for you, e.g.:

func safelyRemove(at index: Int) {
    writer { array in
        if index < array.count {
            array.remove(at: index)
        }
    }
}

Then you can do:

numbers.safelyRemove(at: index)

... and that is thread-safe, but still enjoys the performance benefits of reader-writer synchronization.

But the general idea is that when dealing with a thread-safe collection, you invariably have a series of tasks that you will want to synchronize together, at a higher level of abstraction. By exposing the synchronization methods of reader and writer, you have a simple, generalized mechanism for doing that.


All of that having been said, as others have said, the best way to write thread-safe code is to avoid concurrent access altogether. But if you must make a mutable object thread-safe, then it is the responsibility of the caller to identify the series of tasks that must be performed as a single, synchronized operation.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
0

I see several problems with the posted code and example:

  1. Function removeAtIndex is not checking whether it can actually remove at provided index. So it should be changed to

    public func removeAtIndex(index: Int) {
    
        // Check if it even makes sense to schedule an update
        // This is optional, but IMO just a better practice
        guard count > index else { return }
    
        self.accessQueue.async(flags: .barrier) {
    
            // Check again before removing to make sure array didn't change
            // Here we can actually check the size of the array, since other threads are blocked
            guard self.array.count > index else { return }
            self.array.remove(at: index)
        }
    }
    
  2. Usage of a thread-safe class also implies that you use one operation to both check and do operation on an item that is supposed to be thread-safe. So if you checking array size and then removing it, you are breaking that thread-safety envelope, it's not a correct use of the class. The particular case synchronizedArray.count == 1 { synchronizedArray.remove(at: 0) } is resolved with adjustments to function above (you don't need to check count anymore, as function already does that). But if you still needed a function that both, verifies count, and then removes an item, you would have to create a function in your thread-safe class that does both operations with no possibility that other threads modify an array in between. You may even need 2 functions: synchronizedArray.getCountAndRemove (get count, then remove), and synchronizedArray.removeAndGetCount` (remove, then get count).

    public func getCountAndRemoveAtIndex(index: Int) -> Int {
    
        var currentCount = count
        guard currentCount > index else { return currentCount }
    
        // Has to run synchronously to ensure the value is returned
        self.accessQueue.sync {
    
            currentCount = self.array.count
            guard currentCount > index else { break }
            self.array.remove(at: index)
        }
    
        return currentCount
    }
    
  3. In general removing item at index for an array that is used from multiple threads, is quite meaningless. You can't even be sure what you are removing. Maybe there are some cases where it would make sense, but usually it makes more sense to either remove by some logic (e.g. specific value), or have a function that returns the value of the item it removes (e.g. func getAndRemoveAtIndex(index: Int) -> T)

  4. Always test every function and combination of them. For example if original poster tested removal like this:

    let array = SynchronizedArray<Int>()
    array.append(newElement: 1)
    array.append(newElement: 2)
    array.append(newElement: 3)
    
    DispatchQueue.concurrentPerform(iterations: 5) {_ in
    
        array.removeAtIndex(index: 0)
    }
    

He would get a Fatal error: Index out of range: file Swift/Array.swift, line 1298 in 2 out of 5 threads, so it would be clear that original implementation of this function is not right. Try the same test with the function I posted above, and you will see the difference.

BTW we are only talking about removeAtIndex, but subscript has a similar problem as well. But interestingly first() is implemented correctly.

timbre timbre
  • 12,648
  • 10
  • 46
  • 77
  • @Rob for the second point, you have no choice: if you want to get count and add/update/remove atomically, you have to do it using `sync`. This is why I said before that that `remove` on its own doesn't need it, and you should use it only if you still needed a function that both, verifies count, and then removes an item Regarding the first item: implementation by the book requires that you test condition (which is cheaper) before scheduling a blocking condition - this is what I would definitely do on C++. But if you can prove that specifically in Swift this is sub-optimal, I won't argue. – timbre timbre Dec 15 '20 at 17:25
  • @Rob, you are getting things out of context. The point of my post was 1 - to demonstrate how to fix `remove` function, so you don't need to do something like `if synchronizedArray.count == 1 { synchronizedArray.remove(at: 0) }` which is invalid use of thread-safe class. 2 - I demonstrated what you could do _if_ you still wanted to do 2 operations in thread-safe manner. And I said right before that "this is not necessary since fixed `remove` function will take care about count, but _if you still need_... etc". I could use "getandRemove" for this example - same idea. – timbre timbre Dec 15 '20 at 22:42
  • And all I’m saying is that you can encompass the “check the count before you remove” in thread-safe manner without losing the reader-writer performance benefits, as we have here. – Rob Dec 15 '20 at 23:35