32

Lets say I want to make this code thread-safe:

- (void) addThing:(id)thing { // Can be called from different threads
    [_myArray addObject:thing];
}

GCD seems like the preferred way of achieving this:

- (void) addThing:(id)thing { 
    dispatch_sync(_myQueue, ^{  // _myQueue is serial.
        [_myArray addObject:thing];
    });    
}

What advantage(s) does it have over the traditional method?

- (void) addThing:(id)thing {
    @synchronized(_myArray) {
        [_myArray addObject:thing];
    }
}
Assaf Gamliel
  • 11,935
  • 5
  • 41
  • 56
Robert
  • 37,670
  • 37
  • 171
  • 213
  • *"GCD seams like the preferred way"* ... do you have any reference for that statement? – Martin R Jul 11 '13 at 17:11
  • @MartinR I get the feeling from the GCD WWDCs that this is the modern way. I could be wrong tho. – Robert Jul 11 '13 at 17:13
  • most of the innovations are BETTER...not always!!! – Anoop Vaidya Jul 11 '13 at 17:22
  • 3
    Ok I think I found a reference. @synchronised uses locks under the hood right? "Queues do not impose the same penalties as locks. For example, queueing a task does not require trapping into the kernel to acquire a mutex." http://developer.apple.com/library/ios/documentation/General/Conceptual/ConcurrencyProgrammingGuide/ThreadMigration/ThreadMigration.html#//apple_ref/doc/uid/TP40008091-CH105-SW3 – Robert Jul 11 '13 at 17:36
  • Don't they have entirely different strategies? You can't dispatch_sync in a recursive call on the same queue, you will deadlock, – Tom Andersen Dec 12 '15 at 19:17

4 Answers4

52

Wow. OK -- My original performance assessment was flat out wrong. Color me stupid.

Not so stupid. My performance test was wrong. Fixed. Along with a deep dive into the GCD code.

Update: Code for the benchmark can be found here: https://github.com/bbum/StackOverflow Hopefully, it is correct now. :)

Update2: Added a 10 queue version of each kind of test.

OK. Rewriting the answer:

• @synchronized() has been around for a long time. It is implemented as a hash lookup to find a lock that is then locked. It is "pretty fast" -- generally fast enough -- but can be a burden under high contention (as can any synchronization primitive).

dispatch_sync() doesn't necessarily require a lock, nor does it require the block to be copied. Specifically, in the fastpath case, the dispatch_sync() will call the block directly on the calling thread without copying the block. Even in the slowpath case, the block won't be copied as the calling thread has to block until execution anyway (the calling thread is suspended until whatever work is ahead of the dispatch_sync() is finished, then the thread is resumed). The one exception is invocation on the main queue/thread; in that case, the block still isn't copied (because the calling thread is suspended and, therefore, using a block from the stack is OK), but there is a bunch of work done to enqueue on the main queue, execute, and then resume the calling thread.

• dispatch_async() required that the block be copied as it cannot execute on the current thread nor can the current thread be blocked (because the block may immediately lock on some thread local resource that is only made available on the line of code after the dispatch_async(). While expensive, dispatch_async() moves the work off the current thread, allowing it to resume execution immediately.

End result -- dispatch_sync() is faster than @synchronized, but not by a generally meaningful amount (on a '12 iMac, nor '11 mac mini -- #s between the two are very different, btw... joys of concurrency). Using dispatch_async() is slower than both in the uncontended case, but not by much. However, use of 'dispatch_async()' is significantly faster when the resource is under contention.

@synchronized uncontended add: 0.14305 seconds
Dispatch sync uncontended add: 0.09004 seconds
Dispatch async uncontended add: 0.32859 seconds
Dispatch async uncontended add completion: 0.40837 seconds
Synchronized, 2 queue: 2.81083 seconds
Dispatch sync, 2 queue: 2.50734 seconds
Dispatch async, 2 queue: 0.20075 seconds
Dispatch async 2 queue add completion: 0.37383 seconds
Synchronized, 10 queue: 3.67834 seconds
Dispatch sync, 10 queue: 3.66290 seconds
Dispatch async, 2 queue: 0.19761 seconds
Dispatch async 10 queue add completion: 0.42905 seconds

Take the above with a grain of salt; it is a micro-benchmark of the worst kind in that it does not represent any real world common usage pattern. The "unit of work" is as follows and the execution times above represent 1,000,000 executions.

- (void) synchronizedAdd:(NSObject*)anObject
{
    @synchronized(self) {
        [_a addObject:anObject];
        [_a removeLastObject];
        _c++;
    }
}

- (void) dispatchSyncAdd:(NSObject*)anObject
{
    dispatch_sync(_q, ^{
        [_a addObject:anObject];
        [_a removeLastObject];
        _c++;
    });
}

- (void) dispatchASyncAdd:(NSObject*)anObject
{
    dispatch_async(_q, ^{
        [_a addObject:anObject];
        [_a removeLastObject];
        _c++;
    });
}

(_c is reset to 0 at the beginning of each pass and asserted to be == to the # of test cases at the end to ensure that the code is actually executing all the work before spewing the time.)

For the uncontended case:

start = [NSDate timeIntervalSinceReferenceDate];
_c = 0;
for(int i = 0; i < TESTCASES; i++ ) {
    [self synchronizedAdd:o];
}
end = [NSDate timeIntervalSinceReferenceDate];
assert(_c == TESTCASES);
NSLog(@"@synchronized uncontended add: %2.5f seconds", end - start);

For the contended, 2 queue, case (q1 and q2 are serial):

    #define TESTCASE_SPLIT_IN_2 (TESTCASES/2)
start = [NSDate timeIntervalSinceReferenceDate];
_c = 0;
dispatch_group_async(group, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
    dispatch_apply(TESTCASE_SPLIT_IN_2, serial1, ^(size_t i){
        [self synchronizedAdd:o];
    });
});
dispatch_group_async(group, dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0), ^{
    dispatch_apply(TESTCASE_SPLIT_IN_2, serial2, ^(size_t i){
        [self synchronizedAdd:o];
    });
});
dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
end = [NSDate timeIntervalSinceReferenceDate];
assert(_c == TESTCASES);
NSLog(@"Synchronized, 2 queue: %2.5f seconds", end - start);

The above are simply repeated for each work unit variant (no tricksy runtime-y magic in use; copypasta FTW!).


With that in mind:

• Use @synchronized() if you like how it looks. The reality is that if your code is contending on that array, you probably have an architecture issue. Note: using @synchronized(someObject) may have unintended consequences in that it may cause additional contention if the object internally uses @synchronized(self)!

• Use dispatch_sync() with a serial queue if that is your thing. There is no overhead -- it is actually faster in both the contended and uncontended case -- and using queues are both easier to debug and easier to profile in that Instruments and the Debugger both have excellent tools for debugging queues (and they are getting better all the time) whereas debugging locks can be a pain.

• Use dispatch_async() with immutable data for heavily contended resources. I.e.:

- (void) addThing:(NSString*)thing { 
    thing = [thing copy];
    dispatch_async(_myQueue, ^{
        [_myArray addObject:thing];
    });    
}

Finally, it shouldn't really matter which one you use for maintaining the contents of an array. The cost of contention is exceedingly high for the synchronous cases. For the asynchronous case, the cost of contention goes way down, but the potential for complexity or weird performance issues goes way up.

When designing concurrent systems, it is best to keep the boundary between queues as small as possible. A big part of that is ensuring that as few resources as possible "live" on both sides of a boundary.

bbum
  • 162,346
  • 23
  • 271
  • 359
  • Nice answer - but I didn't really understand the increased flexibility argument - could you not also do `thing = [thing copy];` on the @synchronized way too? – Robert Jul 11 '13 at 18:12
  • Actually I didn't understand the 'dispatch_sync()'s fast path' bit either? From my understanding dispatch queue is a wrapper for a thread. Under what circumstances will the fast path occur, why will it not use a thread? Do you have a reference for it? – Robert Jul 11 '13 at 18:15
  • 1
    @Robert: Switching threads is expensive, and `dispatch_sync()` won't use another thread if it doesn't have to: https://gist.github.com/woolsweater/5882484 I don't fully know what "doesn't have to" entails, but [libdispatch is open source](http://opensource.apple.com/source/libdispatch/libdispatch-228.23/) if you care to look. In my little demo, the original thread has to wait for the result of the operation no matter what, so switching does seem unnecessary. – jscs Jul 11 '13 at 18:47
  • 2
    @bbum In your method `addThing:` you are using `dispatch_async` -- not `dispatch_sync`. Is this by accident? Another follow up: suppose, the sync queue is not heavily contended and the actual code executed is very small (basically an assignment). Would you suggest to use dispatch _sync_ over _async_ -- given the additional danger for dead locks in case of sync can be managed? – CouchDeveloper Jul 12 '13 at 08:01
  • @CouchDeveloper quite intentional. Read the surrounding text. – bbum Jul 12 '13 at 12:50
  • But AFAIK, dispatch_async will always use a heap allocated block. How would it work otherwise? – CouchDeveloper Jul 12 '13 at 12:52
  • @bbum - Great answer - above and beyond :) – Robert Jul 12 '13 at 16:11
2

I have found that dispatch_sync() is a poor way to do locking, it does not support nested calls.

So you can't call dispatch_sync on a serial Q and then call it again in a subroutine with the same Q. Which means it does not behave in the same way as @synchronized does at all.

Tom Andersen
  • 7,132
  • 3
  • 38
  • 55
  • You have to write your own dispatchOnSerialThread method which checks if the current thread == SerialThread it performs the block on the same thread and if not, executes dispatch_sync() on SerialThread. – Hamidreza Vakilian Apr 21 '18 at 16:32
  • it also doesn't support returns and you have to do the __block dance – malhal Jun 01 '18 at 12:38
1

Ok, I done some more tests and here are results:

lock test: mean:2.48661, stdDev:0.50599

synchronized test: mean:2.51298, stdDev:0.49814

dispatch Test: mean:2.17046, stdDev:0.43199

So I was wrong, my bad :( If somebody is interested in test code it's avail here:

static NSInteger retCount = 0;

@interface testObj : NSObject
@end

@implementation testObj

-(id)retain{
    retCount++;
    return [super retain];
}
@end

@interface ViewController : UIViewController{
    NSMutableArray* _a;
    NSInteger _c;
    NSLock* lock;
    NSLock* thlock;
    dispatch_queue_t _q;
}

- (IBAction)testBtn:(id)sender;
@end

@implementation ViewController

- (void)viewDidLoad
{
    [super viewDidLoad];
}

-(NSTimeInterval)testCase:(SEL)aSel name:(NSString*)name{
    _a = [[NSMutableArray alloc] init];
    retCount = 0;
    //Sync test
    NSThread* th[10];
    for(int t = 0; t < 10;t ++){
        th[t] = [[NSThread alloc] initWithTarget:self selector:aSel object:nil];
    }

    NSTimeInterval start = [NSDate timeIntervalSinceReferenceDate];
    for(int t = 0; t < 10;t ++){
        [th[t] start];
    }
    NSInteger thCount = 1;
    while(thCount > 0){
        thCount = 0;
        for(int t = 0; t < 10;t ++){
            thCount += [th[t] isFinished] ? 0 : 1;
        }
    }
    NSTimeInterval end = [NSDate timeIntervalSinceReferenceDate];
    NSLog(@"%@: %2.5f, retainCount:%d, _c:%d, objects:%d", name, end-start, retCount, _c, [_a count]);
    [_a release];
    for(int t = 0; t < 10;t ++){
        [th[t] release];
    }
    return end-start;
}

-(void)syncTest{
    for(int t = 0; t < 5000; t ++){
        [self synchronizedAdd:[[[testObj alloc] init] autorelease] ];
    }
}

-(void)dispTest{
    for(int t = 0; t < 5000; t ++){
        [self dispatchSyncAdd:[[[testObj alloc] init] autorelease] ];
    }
}

-(void)lockTest{
    for(int t = 0; t < 5000; t ++){
        [self lockAdd:[[[testObj alloc] init] autorelease] ];
    }
}


- (void) synchronizedAdd:(NSObject*)anObject
{
    @synchronized(self) {
        [_a addObject:anObject];
        _c++;
    }
}

- (void) dispatchSyncAdd:(NSObject*)anObject
{
    dispatch_sync(_q, ^{
        [_a addObject:anObject];
        _c++;
    });
}

- (void) lockAdd:(NSObject*)anObject
{
    [lock lock];
        [_a addObject:anObject];
        _c++;
    [lock unlock];
}

- (double)meanOf:(NSArray *)array
{
    double runningTotal = 0.0;

    for(NSNumber *number in array)
    {
        runningTotal += [number doubleValue];
    }

    return (runningTotal / [array count]);
}

- (double)standardDeviationOf:(NSArray *)array
{
    if(![array count]) return 0;

    double mean = [self meanOf:array];
    double sumOfSquaredDifferences = 0.0;

    for(NSNumber *number in array)
    {
        double valueOfNumber = [number doubleValue];
        double difference = valueOfNumber - mean;
        sumOfSquaredDifferences += difference * difference;
    }

    return sqrt(sumOfSquaredDifferences / [array count]);
}

-(void)stats:(NSArray*)data name:(NSString*)name{
    NSLog(@"%@: mean:%2.5f, stdDev:%2.5f", name, [self meanOf:data], [self standardDeviationOf:data]);
}

- (IBAction)testBtn:(id)sender {
    _q = dispatch_queue_create("array q", DISPATCH_QUEUE_SERIAL);
    lock = [[NSLock alloc] init];
    NSMutableArray* ltd = [NSMutableArray array];
    NSMutableArray* std = [NSMutableArray array];
    NSMutableArray* dtd = [NSMutableArray array];
    for(int t = 0; t < 20; t++){
        [ltd addObject: @( [self testCase:@selector(lockTest) name:@"lock Test"] )];
        [std addObject: @( [self testCase:@selector(syncTest) name:@"synchronized Test"] )];
        [dtd addObject: @( [self testCase:@selector(dispTest) name:@"dispatch Test"] )];
    }
    [self stats: ltd name:@"lock test"];
    [self stats: std name:@"synchronized test"];
    [self stats: dtd name:@"dispatch Test"];
}
@end
The Tosters
  • 417
  • 3
  • 15
-1

There are few things: 1) @Synchronize is heavy version of locking on some monitor (I personally prefer NSLock/NSRecursiveLock) 2) Dispatch_sync is building execution queue.

both approach leads to similar result in your case however for such simple solution as making collection thread safe I would prefer 1.

Why:

  • If you have multiple cores then multiple threads may work at the same time. Depending of scheduler they will lock for very short time on monitor.

  • it's much lighter then allocating new block, retaining 'thing' putting into queue (this is also thread synchronized), and executing when working queue is ready.

  • in both approach order of execution will be very different.

  • If at some point you discover heavy usage of collection you may consider change lock to Read/Write type which is much simpler to refactor/change if you use some NSLock-like class instead of sync_queue.

bbum
  • 162,346
  • 23
  • 271
  • 359
The Tosters
  • 417
  • 3
  • 15
  • if you adding to queue which is synchronized then you need also lock on this. So you can't run from locks. It's lighter to aquire single lock than aquire lock and manage block (queue). Putting task into queue involves block creation/handling. In scenario from question queue is much heavier than lock. – The Tosters Jul 12 '13 at 06:25
  • 2
    OK -- Yes, this answer is incorrect. For synchronous dispatch, there is no locking, no allocations, and no retaining required. And, in general, dispatch queues are *much lighter weight* than locks exactly because the internal implementation does not generally use locking at all; preferring the much more efficient atomic compare and swap. Using dispatch_sync() for this simple case is 30% faster when there is no contention. – bbum Jul 12 '13 at 16:02
  • Well nice example above but this is single thread code. When I got some free time (probably Monday) I'll check multithread approach. Simple loop from examples above looks nice, however I'm curious how this work when multithread take place. I'm still not agree with you but I'll take my final statement after some more tests. – The Tosters Jul 12 '13 at 19:21
  • The example I posted is both single queue (single thread) and multiple queue (multi thread). Dispatch_sync is faster in all cases. – bbum Jul 12 '13 at 20:05