3

I'm trying to build an array of dictionaries in a background thread while keeping access to the current array until the background operation is done. Here's a simplified version of my code:

@property (nonatomic, strong) NSMutableArray *data;
@property (nonatomic, strong) NSMutableArray *dataInProgress;

- (void)loadData {
    self.dataInProgress = [NSMutableArray array];
    dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{
        [self loadDataWorker];
    });
}

- (void)loadDataWorker {
    for (int i=0; i<10000; i++) {
        [self addDataItem];
    }
    dispatch_async(dispatch_get_main_queue(), ^{
        [self loadDataFinish]; // the crash occurs before we get to this point
    });
}

- (void)addDataItem {
    // first check some previously added data
    int currentCount = (int)[self.dataInProgress count];
    if (currentCount > 0) {
        NSDictionary *lastItem = [self.dataInProgress objectAtIndex:(currentCount - 1)];
        NSDictionary *checkValue = [lastItem objectForKey:@"key3"]; // this line crashes with EXC_BAD_ACCESS
    }

    // then add another item
    NSDictionary *dictionaryValue = [NSDictionary dictionaryWithObjectsAndKeys:@"bar", @"foo", nil];
    NSDictionary *item = [NSDictionary dictionaryWithObjectsAndKeys:@"value1", @"key1", @"value2", @"key2", dictionaryValue, @"key3", nil];

    // as described in UPDATE, I think this is the problem
    dispatch_async(dispatch_get_main_queue(), ^{
        [dictionaryValue setObject:[self makeCustomView] forKey:@"customView"];
    });

    [self.dataInProgress addObject:item];
}

- (UIView *)makeCustomView {
    return [[UIView alloc] initWithFrame:CGRectMake(0, 0, 0, 0)];
}

- (void)loadDataFinish {
    self.data = [NSMutableArray arrayWithArray:self.dataInProgress];
}

This works fine in most cases, but when the dataset is large, I start to get crashes on the line indicated above. The likelihood of a crash is greater with more data or a device with less memory. On an iPhone 6 with 10,000 items, it happens about one in five times. So it looks like when memory gets tight, the dictionaries inside the data array are destroyed before I access them.

If I do everything in the main thread there are no crashes. I originally had this problem with non-ARC code, then I converted my project to ARC and the same problem remains.

Is there a way to ensure the objects added earlier in the build process are retained until I'm done? Or is there a better way to do what I'm doing?

Here's a stack trace:

thread #17: tid = 0x9c586, 0x00000001802d1b90 libobjc.A.dylib`objc_msgSend + 16, queue = 'com.apple.root.background-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
frame #0: 0x00000001802d1b90 libobjc.A.dylib`objc_msgSend + 16
frame #1: 0x0000000180b42384 CoreFoundation`-[__NSDictionaryM objectForKey:] + 148
frame #2: 0x00000001002edd58 MyApp`-[Table addDataItem](self=0x000000014fd44600, _cmd="addDataItem", id=0x00000001527650d0, section=3, cellData=0x0000000152765050) + 1232 at Table.m:392
frame #4: 0x00000001002eca28 MyApp`__25-[Table loadData]_block_invoke(.block_descriptor=0x000000015229efd0) + 52 at Table.m:265
frame #5: 0x0000000100705a7c libdispatch.dylib`_dispatch_call_block_and_release + 24
frame #6: 0x0000000100705a3c libdispatch.dylib`_dispatch_client_callout + 16
frame #7: 0x0000000100714c9c libdispatch.dylib`_dispatch_root_queue_drain + 2344
frame #8: 0x0000000100714364 libdispatch.dylib`_dispatch_worker_thread3 + 132
frame #9: 0x00000001808bd470 libsystem_pthread.dylib`_pthread_wqthread + 1092
frame #10: 0x00000001808bd020 libsystem_pthread.dylib`start_wqthread + 4

UPDATE

I traced through my full code with the answers below in mind, particularly those about locking while multithreading, and realized that part of the data I'm adding to my data array is a UIView that I'm creating during the build process. Since it's bad to build views in a background thread, and I did see problems when doing that, I'm jumping back to the main thread for makeCustomView. See the lines of code I added above with "UPDATE" in the comment. This must be the problem now; when I skip adding the custom view, I have no more crashes.

I could rework the build workflow so that all the data except the custom views are added on the background thread, then I could make a second pass and add the custom views on the main thread. But is there a way to manage the threads in this workflow? I tried locking with NSLock before and after calling makeCustomView, but that made no difference. I also found an SO answer saying NSLock is basically outdated, so I didn't go further with that.

Community
  • 1
  • 1
arlomedia
  • 8,534
  • 5
  • 60
  • 108
  • 1. Only for the protocol: Please add a dealloc method to the class of self to check, whether the instance itself is released. 2. Are you building in release mode? 3. What you get from `-objectAtIndex:` in the line ahead of the crash is already corrupted. The crash line is only the consequence. To me it seems to be the way, that somewhere else you have a corrupted pointer destroying the contents of the array. – Amin Negm-Awad Jun 01 '16 at 07:56
  • 1. I checked and self (the view controller) is not getting deallocated prematurely. 2. I'm building in debug mode, but the problem also happens in the released app. – arlomedia Jun 01 '16 at 22:08
  • Yes, doing anything with views in background queues is a potential crash. But there is one thing, I do not understand: Does it work, a) if you put the creation into the main queue or b) if you remove the view creation completely? I would think, that a) is enough. Yes, locks are outdated. – Amin Negm-Awad Jun 02 '16 at 03:58
  • If I do everything in the main thread, or if I remove the view creation, there are no crashes. Also, the crash isn't caused by the view creation itself, but the view doesn't display correctly unless I switch back to the main thread to create it. That switching back and forth is what seems to be causing the crash. – arlomedia Jun 02 '16 at 04:20
  • Sorry, I still do not get you: When you create the view in a dispatch async on the main the main thread (as in your code), it still crashes? If so: Accesses the view creation the data? In such a case it should be possible to solve the problem. – Amin Negm-Awad Jun 02 '16 at 04:48

4 Answers4

3

If I understood you correctly, concurrent accesses to the dataInProgress array causes the problem, because the array is filled in a background thread and used in the main thread. But NSMutableArray is not thread safe. This fits to my intention that the array itself is corrupted.

You could solve that with NSLock to serialize the accesses to the array, but this is akin of outdated and does not fit to the rest of your code, which uses the more modern (and better) GCD.

A. Situation

What you have:

  • a builder control flow, which has to run in background
  • a creation of views control flow, which has to run in the main queue (thread). (I'm not completely sure, whether the pure creation of a view has to be done in main thread, but i would do.)
  • both control flows accesses the same resource (dataInProgress)

B. GCD

With classical thread/lock approach you start async control flows and serialize them with locks, when they access concurrently a shared resource.

With GCD you start control flows concurrently to each other, but serialized for a given shared resource. (Basically, there are more features, more complexity, but this is, what we need here.)

C. Serializing

It is correct to start the builder in a background queue ("thread") to run it without blocking the main thread. Done.

It is correct to switch back to main thread, if you want to do something with UI elements, esp. creating a view.

Since both control flows accesses the same resource, you have to serialize the accesses. You do this by creating a (serial) queue for that resource:

 …
 @property dispatch_queue_t dataInProgressAccessQ;
 …

 // In init or whatever
 self. dataInProgressAccessQ = dispatch_queue_create("com.yourcompany.dataInProgressQ", NULL);

After doing that, you put every access to the dataInProgress array in that queue. There is a simple example for that:

// [self.dataInProgress addObject:item];
dispatch_async( self.dataInProgressAccessQ,
^{
    [self.dataInProgress addObject:item];
});

In this case it is very easy, because you have to switch the queue at the and of the code. If it is in the middle, you have two options:

a) Use the queue similar to a lock. Let's have an example:

// NSInteger currentCount = [self.dataInProgress count]; // Why int?
NSInteger currentCount;
dispatch_sync( self.dataInProgressAccessQ,
^{
  currentCount = [self.dataInProgress count];
});
// More code using currentCount

Using dispatch_sync() will let the code execution wait, until accesses from other control flows are finished. (It is like a lock.)

Edit: As with locks, the access is guaranteed to be serialized. But there might be the problem, that another thread removes objects from the array. Let's have a look to such a situation:

// NSInteger currentCount = [self.dataInProgress count]; // Why int?
NSInteger currentCount;
dispatch_sync( self.dataInProgressAccessQ,
^{
  currentCount = [self.dataInProgress count];
});
// More code using currentCount
// Imagine that the execution is stopped here
// Imagine that -makeCustomView removes the last item in meanwhile
// Imagine that the execution continues here 
// -> currentCount is not valid anymore. 
id lastItem = [self.dataInProgress objectAtIndex:currentCount]; // crash: index out of bounds

To prevent this, you really have to isolate your concurrent code. This highly depends on your code. However, in my example:

id lastItem;
dispatch_sync( self.dataInProgressAccessQ,
^{
  NSInteger currentCount;
  currentCount = [self.dataInProgress count];
  lastItem = [self.dataInProgress objectAtIndex:currentCount]; // don't crash: bounds are not changed
});
// Continue with lastItem

As you can imagine, when getting the last item, if can be removed from the array in the very next moment after you read it. Maybe this causes problems of inconsistency in your code. It really depends on your code.

End of edit

b) Maybe you get performance problems, because is works like a lock (synch). If so, you have to analyze your code and extract parts, that can run concurrently again. The pattern looks like this:

// NSInteger currentCount = [self.dataInProgress count]; // Why int?
dispatch_async( self.dataInProgressAccessQ, // <-- becomes asynch
^{
  NSInteger currentCount = [self.dataInProgress count];
  // go back to the background queue to leave the access queue fastly
  dispatch_async( dispatch_get_global_queue(),
  ^{
    // use current count here.
  });
});

dispatch_async( self.dataInProgressAccessQ,
^{
  // Another task, that can run concurrently to the above
});

What you can do there, is a matter of your concrete code. Maybe it is a help for you, to have your own private builder queue instead of using the global queue.

But this is the basic approach: Move a task into a queue and do not wait, until it is finished, but add code at the end, that completes the task in another control flow.

Instead of

Code
--lock--
var = Access code
--unlock--
More Code using var

it is

Code
asynch {
  var Access Code
  asynch {
    More code using var
  }
}

Of course, you have to do the same inside -makeCustomView.

Amin Negm-Awad
  • 16,582
  • 3
  • 35
  • 50
  • Thanks, I see how I can use dispatch_queue_create to make my own queue. But then should I nest these dispatch_async blocks inside my existing background/main code, or should I use these instead of the background/main queues I'm using now? – arlomedia Jun 02 '16 at 06:41
  • If you put the whole builder code into your own queue this will have the result, that the whole code runs blocking. Inside `-makeCustomView` the usage of the queue will shift the execution of the queued code to the end of the builder code. Likely this is not, what you want. Typically you get the best performance and best "feeling of parallelism" if you have many fine-grained queues, one for each resource. However, this can lead to the problem, that dispatching takes more time. – Amin Negm-Awad Jun 02 '16 at 08:15
  • In your case you should put the builder code into a background queue as you did, put the view code into the main queue (associated with the main thread) as you did and put only the on `dataInProgress` conflicting code in a separate, serializing queue. To get a good performance, the code pieces inside this queue should be as small as it can be. +++ Important: There is a edit. – Amin Negm-Awad Jun 02 '16 at 08:17
  • Thanks for all the explanations. It is going to take a while to integrate this into my code, but I think this is the right approach. – arlomedia Jun 02 '16 at 09:48
1

I agree with Phillip Mills. This looks like a thread safety issue around your self.dataInProgress object.

From Apple docs https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Multithreading/ThreadSafetySummary/ThreadSafetySummary.html :

Mutable objects are generally not thread-safe. To use mutable objects in a threaded application, the application must synchronize access to them using locks. (For more information, see Atomic Operations). In general, the collection classes (for example, NSMutableArray, NSMutableDictionary) are not thread-safe when mutations are concerned. That is, if one or more threads are changing the same array, problems can occur. You must lock around spots where reads and writes occur to assure thread safety.

If addDataItem is being called from various background threads, you need to lock around reading and writing to self.dataInProgress.

Ashley
  • 156
  • 5
  • addDataItem is only called from the one background thread, and within that thread it is called synchronously, and nothing else is accessing dataInProgress until it's done. So are locks needed in that case? The problem is definitely related to multithreading somehow since it doesn't happen if I do everything on the main thread. – arlomedia Jun 01 '16 at 02:45
  • I looked further into this and I think your answer is headed in the right direction. I added an update to my question above; can you look at that and suggest how I can fix my current workflow? – arlomedia Jun 02 '16 at 01:46
  • You should serialize every access to a resource that is not thread-safe, even if it is a write-read-combination. – Amin Negm-Awad Jun 02 '16 at 06:07
0

I don't think you need deep copies. If the dictionaries aren't mutable, all you need is for them to not be released...and a copy of the array they're in will do that for you.

What I believe you need is synchronization around any access to self.data. I suggest creating a NSLock object for your class and wrapping each of the following two lines with lock/unlock method calls:

self.data = [NSMutableArray arrayWithArray:self.dataInProgress];
//...
NSDictionary *item = [self.data objectAtIndex:index];

Also, why does self.data need to be mutable? If it doesn't, self.data = [self.dataInProgress copy]; is simpler...and quite possibly more efficient for memory and performance.


The one thing that's worrying me is, what about the caller of getData. It may not know that the self.data array has changed. If the array becomes shorter, you're headed for an "index out of bounds" crash.

It would be good to only call getData when you know the array is going to be stable. (In other words, synchronize the data gets at a higher level.)

Phillip Mills
  • 30,888
  • 4
  • 42
  • 57
  • I think you're right about the deep copies ... I just need the data to not be released. I will explore your suggestion. – arlomedia May 05 '16 at 19:31
  • As for getData knowing that self.data has changed, that would be a problem in my simplified code, but in the complete code the data access only happens with full knowledge of the current data. – arlomedia May 05 '16 at 19:31
  • I just realized the crash is happening at a different point than I thought, and updated my question accordingly. – arlomedia May 05 '16 at 20:53
0

I would attempt at passing in a weak reference to self. I bet if you might have a strong retain cycle happening there somewhere. If I remember correctly, __weak doesn't up the retain count, and __block allows you to change the variable

- (void)loadData {
    self.dataInProgress = [NSMutableArray array];

    __weak __block SelfClassName *weakSelf = self;
    dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{
            [weakSelf loadDataWorker];
    });
}

- (void)loadDataWorker {
    for (int i=0; i<10000; i++) {
        [self addDataItem];
    }

    __weak __block SelfClassName *weakSelf = self;
    dispatch_async(dispatch_get_main_queue(), ^{
        [weakSelf loadDataFinish]; 
    });
}
AntonTheDev
  • 899
  • 6
  • 13
  • I tried this, but the problem remains. It seems like I need stronger, not weaker references somewhere, since the crash is EXC_BAD_ACCESS. – arlomedia May 30 '16 at 21:33
  • What I'm wondering is why stronger references? Wouldn't you be overriding the data that is intended to be displayed on screen, I will bet you are still parsing the old data, and starting a task to parse the new data if I'm right? It's the same data dictionary you are referencing for display right? – AntonTheDev May 31 '16 at 00:49
  • Hot are you initiating each parse job? Is it by pushing a view controller? – AntonTheDev May 31 '16 at 00:51
  • I thought stronger references because some of the data in the dictionary is getting released before I'm done building it. The building process is initially triggered by viewWillAppear and I just confirmed that it is not called duplicate times. I'm building the data in a separate dataInProgress variable, so if I need to refresh the display, the old data is available for display until the new data is ready (which takes a few seconds at most). The crash, though, is happening the first time the data is built, so I don't think it's an issue of swapping the new data in for the old after refreshing. – arlomedia May 31 '16 at 02:04
  • Try overriding dealloc, and do an NSLog print out. If you don't see it you have a strong reference cycle and that's why the crash is most likely happening – AntonTheDev May 31 '16 at 02:07
  • Overriding the dealloc for the view controller doesn't show anything unusual. I started subclassing the dictionaries to put breakpoints in their dealloc methods, but found out that subclassing a dictionary object is a whole separate challenge, so I stopped. – arlomedia Jun 01 '16 at 22:47