Synchronizing read/write access to an instance variable for high performance in iOS?

Question

What's the best way / least wait-causing way to synchronize read/write access to an instance variable in objective-c for iOS?

The variable gets read and written very often (let's say 1000 times per second read and written). It is not important that changes take effect immediately. It is not even important that reads get consistent data with one-another, but writes must sooner or later be reflected in the data acquired by reads. Is there some data structure which allows this?

I thought of this:

Create two variables instead of one variable; let's call them v[0] and v[1].
For each v[i], create a concurrent dispatch queue for constructing a readers-writer-locking mechanism around it. Let's call them q[i].
Now for a writing operation, only v[0] gets written to, adhering to the locking mechanism with q[0].
On a read operation, at first v[1] is read and only at a certain chance, e.g. 1%, the read operation looks into v[0] and updates v[1] if necessary.

The following pseudo-code illustrates this:

typedef int VType; // the type of the variable

VType* v; // array of first and second variable
dispatch_queue_t* q; // queues for synchronizing access to v[i]

- (void) setV:(VType)newV {
    [self setV:newV at:0];
}

- (void) setV:(VType)newV at:(int)i {
    dispatch_barrier_async(q[i], ^{
        v[i] = newV;
    });
}

- (VType) getV:(int)i {
    __block VType result;

    dispatch_sync(q[i], ^{
        result = v[i];
    });

    return result;
}

- (VType) getV {
    VType result = [self getV:1];

    if ([self random] < 0.01) {
        VType v0_result = [self getV:0];

        if (v0_result != result) {
            [self setV:v0_result at:1];
            result = v0_result;
        }
    }

    return result;
}

- (float) random {
    // some random number generator - fast, but not necessarily good
}

This has the following benefits:

v[0] is usually not occupied with a read operation. Therefor, a write operation usually does not block.
At most times, v[1] does not get written to, thus read operations on this one usually don't block.
Still, if many read operations occur, eventually the written values are propagated from v[0] into v[1]. Some values might be missed, but that doesn't matter for my application.

What do you guys think, does this work? Are there better solutions?

UPDATE:

Some performance benchmarking (reads and writes of one benchmark at a time are done as quickly as possible concurrently for 1 second, one reading queue, one writing queue):

On iPhone 4S with iOS 7:

runMissingSyncBenchmark: 484759 w/s
runMissingSyncBenchmark: 489558 r/s
runConcurrentQueueRWSyncBenchmark: 2303 w/s
runConcurrentQueueRWSyncBenchmark: 2303 r/s
runAtomicPropertyBenchmark: 460479 w/s
runAtomicPropertyBenchmark: 462145 r/s

In Simulator with iOS 7:

runMissingSyncBenchmark: 16303208 w/s
runMissingSyncBenchmark: 12239070 r/s
runConcurrentQueueRWSyncBenchmark: 2616 w/s
runConcurrentQueueRWSyncBenchmark: 2615 r/s
runAtomicPropertyBenchmark: 4212703 w/s
runAtomicPropertyBenchmark: 4300656 r/s

So far, atomic property wins. Tremendously. This was tested with an SInt64.

I expected that the approach with the concurrent queue is similar in performance to the atomic property, as it is the standard approach for an r/w-sync mechanism.

Of course, the runMissingSyncBenchmark sometimes produces reads which show that a write of the SInt64 is halfway done.

Did you benchmark this against the simple queue reader-writer pattern (as discussed in WWDC 2012 video [Asynchronous Design Patterns with Blocks, GCD, and XPC](https://developer.apple.com/videos/wwdc/2012/?id=712), namely the same concurrent queue, synchronous reads, asynchronous barrier writes, like you do above, but without the separate v[0] and v[1] logic)? The GCD-based reader-writer pattern is pretty efficient (generally more efficient than locks), and it feels like your above code sample would be less efficient than the simple reader-writer pattern. — Rob, Dec 31 '13 at 07:27
@Rob, I haven't compared benchmark results yet. I've only profiled the standard reader-writer pattern with one concurrent queue. Instruments shows that due to waiting, the time spent reading and writing is doubled on average. I'm trying to find a way to optimize this. — Daniel S., Dec 31 '13 at 15:52
Ok. As CouchDeveloper suggest, your reader could employ a "try" locking mechanism (either with spin lock or read/write mutex). But if the object is immutable (or in your example, a fundamental data type), can't you just use an `atomic` property? In those simple cases where you can get away with `atomic` properties, it often is much faster than both locks and reader-write GCD pattern. — Rob, Dec 31 '13 at 19:46
@Rob, I'll probably follow your suggestion to use an atomic property because it's so simple and 2000 times faster (LOL) than the concurrent_queue-dispatch_sync-dispatch_barrier_sync approach. — Daniel S., Jan 02 '14 at 12:51
Your 2000x comment prompted me to do some benchmarking and I got some fairly different results (the atomic operations were only 20-40% faster than GCD reader-writer pattern, not three orders of magnitude faster). Bottom line, atomic was a little faster than either GCD, spin locks, or mutex locks with try algorithm. And those three types of synchronization were 2-3 times faster than pthread read-write lock, which itself was 3-4 times faster than simple mutex/NSLock/`@synchronized` lock. So, while my atomic performance didn't exhibit the 2000x gain you described, it definitely was fastest. — Rob, Jan 05 '14 at 20:59
Synchronize your instance variable with respect to *what*? What precisely does this "synchronization" accomplish? What invariant is preserved that would not be preserved without the mechanism? — Hot Licks, Jan 05 '14 at 21:59
@HotLicks, if the variable is larger than 32 bits, writes are not atomic. If reading and writing is not synchronized, then a read might result in a mixture of a previous and a new value. — Daniel S., Jan 07 '14 at 00:23

CouchDeveloper · Answer 1 · 2013-12-31T08:41:58.357

Perhaps, a spinlock will be optimal (see man 3 spinlock).

Since a spin lock can be tested if it is currently locked (which is a fast operation) the reader task could just return the previous value if the spin lock is held by the writer task.

That is, the reader task uses OSSpinLockTry() and retrieves the actual value only if the lock could be obtained. Otherwise, the read task will use the previous value.

The writer task will use OSSpinLockLock() and OSSpinLockUnlock() respectively in order to atomically update the value.

From the man page:

NAME OSSpinLockTry, OSSpinLockLock, OSSpinLockUnlock -- atomic spin lock synchronization primitives

SYNOPSIS
 #include <libkern/OSAtomic.h>

 bool
 OSSpinLockTry(OSSpinLock *lock);

 void
 OSSpinLockLock(OSSpinLock *lock);

 void
 OSSpinLockUnlock(OSSpinLock *lock);
DESCRIPTION

Spin locks are a simple, fast, thread-safe synchronization primitive that is suitable in situations where contention is expected to be low. The spinlock operations use memory barriers to synchronize access to shared memory protected by the lock. Preemption is possible while the lock is held.

OSSpinLockis an integer type. The convention is that unlocked is zero, and locked is nonzero. Locks must be naturally aligned and cannot be in cache-inhibited memory.

OSSpinLockLock() will spin if the lock is already held, but employs various strategies to back off, making it immune to most priority-inversion livelocks. But because it can spin, it may be inefficient in some situations.

OSSpinLockTry() immediately returns false if the lock was held, true if it took the lock. It does not spin.

OSSpinLockUnlock() unconditionally unlocks the lock by zeroing it.

RETURN VALUES

OSSpinLockTry() returns true if it took the lock, false if the lock was already held.

There is a bug in iOS with spin locks: http://openradar.appspot.com/23896366 http://mjtsai.com/blog/2015/12/16/osspinlock-is-unsafe/#comment-2543463 — Petr, Jan 27 '16 at 08:15
@Petr Thank your very much for this information. I read the bug report, and it has the potential that software for iOS should be rewritten. Luckily, I don't use spinlocks in my code :) I'm perfectly happy with GCD. — CouchDeveloper, Jan 27 '16 at 09:08
@Petr Need to check how this issue will affect GCD, as well :/ — CouchDeveloper, Jan 27 '16 at 09:26
look at the answer by @foundry here: http://stackoverflow.com/questions/30269243/application-sticks-on-osspinlocklockslow?lq=1. He is touching GCD in his answer. — Petr, Jan 27 '16 at 12:08

Rob · Answer 2 · 2014-01-05T21:29:23.467

I think CouchDeveloper's suggestion of using try-checks in the synchronization locks is an intriguing possibility. In my particular experiments, it had negligible impact with spin locks, modest gain for pthread read-write lock, and most significant impact with simple mutex lock). I'd wager that difference configurations would achieve some gain with spin locks, too, but must have I failed to get enough contention with spin locks for the impact of using try to be observable.

If you're working with immutable or fundamental data types, you can also use the atomic property as described in the Synchronization Tools section in the Threading Programming Guide:

Atomic operations are a simple form of synchronization that work on simple data types. The advantage of atomic operations is that they do not block competing threads. For simple operations, such as incrementing a counter variable, this can lead to much better performance than taking a lock.

Unaware that you had done your own benchmarking, I benchmarked a couple of these techniques discussed in that document (doing mutex lock and pthread read/write lock both with and without the "try" algorithm), as well as the GCD reader-writer pattern. In my test, I did 5m reads while doing 500k writes of random values. This yielded the following benchmarks (measured in seconds, smaller being better).

| Tables                    | Simulator | Device   | 
+---------------------------+-----------+----------+
| Atomic                    |       1.9 |      7.2 |
| Spinlock w/o try          |       2.8 |      8.0 |
| Pthread RW lock w/ try    |       2.9 |      9.1 |
| Mutex lock w/ try         |       2.9 |      9.4 |
| GCD reader-writer pattern |       3.2 |      9.1 |
| Pthread RW lock w/o try   |       7.2 |     22.2 |
| NSLock                    |      23.1 |     89.7 |
| Mutex lock w/o try        |      24.2 |     80.2 |
| @synchronized             |      25.2 |     92.0 |

Bottom line, in this particular test, atomic properties performed the best. Obviously, atomic properties have significant limitations, but in your scenario, it sounds like this is acceptable. These results are obviously going to be subject to the specifics of your scenario, and it sounds like your testing has confirmed that atomic properties yielded the best performance for you.

Thanks for the additional benchmarks. They show more information than mine. The x2000 factor also made me wonder, but I couldn't find a problem with my code. — Daniel S., Jan 07 '14 at 00:30
@Daniel S. If this is still actual - there is a bug in iOS with spin locks: http://openradar.appspot.com/23896366 http://mjtsai.com/blog/2015/12/16/osspinlock-is-unsafe/#comment-2543463 — Petr, Jan 27 '16 at 08:16

Synchronizing read/write access to an instance variable for high performance in iOS?

2 Answers2

Linked