3

Ok, im a bit lost with this one, i am currently trying to run a background core data operation using a second ManagedObjectContext with its type set to NSPrivateQueueConcurrencyType and failing miserably with the above error.

I have a custom subclass of NSOperation, which is being passed an NSArray of strings, and the PersistentStoreCoordinator from the main thread, it then creates its own ManagedObjectContext, runs a query and performs and operation.

Here is the code from the class:

//
//  ProcessProfanity.m
//  Hashtag Live Desktop
//
//  Created by Gareth Jeanne on 24/03/2014.
//  Copyright (c) 2014 Gareth Jeanne. All rights reserved.
//

#import "ProcessProfanity.h"
#import "Tweet.h"

static const int ImportBatchSize = 250;

@interface ProcessProfanity ()
@property (nonatomic, copy) NSArray* badWords;
@property (nonatomic, strong) NSManagedObjectContext* backgroundContext;
@property (nonatomic, strong) NSPersistentStoreCoordinator* persistentStoreCoordinator;
@end

@implementation ProcessProfanity


{

}


- (id)initWithStore:(NSPersistentStoreCoordinator*)store badWords:(NSArray*)words
{
self = [super init];
if(self) {
    self.persistentStoreCoordinator = store;
    self.badWords = words;
}
return self;
}


- (void)main
{
_backgroundContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
_backgroundContext.persistentStoreCoordinator = [self persistentStoreCoordinator];
_backgroundContext.undoManager = nil;
[_backgroundContext performBlockAndWait:^
{
    [self import];
}];
}

- (void)import
{

//Create new fetch request
NSFetchRequest *request = [[NSFetchRequest alloc] init];

//Setup the Request
[request setEntity:[NSEntityDescription entityForName:@"Tweet" inManagedObjectContext:self.backgroundContext]];

NSError *error = nil;

//Create an array from the returned objects
NSArray* tweetsToProcess = [self.backgroundContext executeFetchRequest:request error:&error];
NSAssert2(tweetsToProcess != nil && error == nil, @"Error fetching events: %@\n%@", [error localizedDescription], [error userInfo]);

for (Tweet* tweetToCheck in tweetsToProcess){
    __block NSString *result = nil;
    [self.badWords indexOfObjectWithOptions:NSEnumerationConcurrent
                                   passingTest:^(NSString *obj, NSUInteger idx, BOOL *stop)
     {
         if (tweetToCheck){
             if ([tweetToCheck.text rangeOfString:obj].location != NSNotFound)
             {
                 result = obj;
                 *stop = YES;
                 //return YES;
             }
         }
         return NO;
     }];

    if (!result){
        //DDLogVerbose(@"The post does not contain any of the words from the naughty list");
        if(tweetToCheck){
            tweetToCheck.profanity = [NSNumber numberWithBool:false];
        }
    }
    else{
        if(tweetToCheck){
            //DDLogVerbose(@"The string contains '%@' from the the naughty list", result);
            tweetToCheck.profanity = [NSNumber numberWithBool:true];
        }
    }

}
[self.backgroundContext save:NULL];
}

@end

And this is how i am calling it:

-(void)checkForProfanity{

if(!self.operationQueue){
self.operationQueue = [[NSOperationQueue alloc] init];
}

NSArray* termsToPass = [self.filterTerms copy];
ProcessProfanity* operation = [[ProcessProfanity alloc] initWithStore:self.persistentStoreCoordinator badWords:termsToPass];
[self.operationQueue addOperation:operation];


}

Edit 1

The specific line i seem to be getting the error on, or at least where Xcode is breaking is:

if ([tweetToCheck.text rangeOfString:obj].location != NSNotFound)

I have managed to narrow this down a bit, the NSArray that contains the list of terms to search the strings for is potentially quite large, possibly over a 1,000 NSStrings. If i test with an array of that size, i get the issue. However if i reduce the array to around 15 NSStrings, i do not get the error, so i don't think this is necessarily a thread related issue, i'm wondering if the array is getting released in the main thread. I have modified the code to make a deep copy, and then a __block copy as follows, but it doesn't seem to have helped.

self.badWords = [[NSArray alloc] initWithArray:words copyItems:YES];

and

for (Tweet* tweetToCheck in tweetsToProcess){
    __block NSArray *array = [[NSArray alloc] initWithArray:self.badWords copyItems:YES];
    __block NSString *result = nil;
    [array indexOfObjectWithOptions:NSEnumerationConcurrent

In fact, at the point where Xcode breaks, if i PO array, i get an object not found message, but if i po result, i correct get an object returned that is nil.

Edit 2

So i have made the following changes, with no change:

Made the NSArray strong rather than copy:

@property (nonatomic, strong) NSArray* badWords;

And made it a copy when allocated:

self.badWords = [[NSArray alloc] initWithArray:words copyItems:YES];

And created a local copy of the NSArray with the ___block declaration inside the actual method processing the objects:

__block NSArray *array = [[NSArray alloc] initWithArray:self.badWords copyItems:YES];

Which should surely mean it sticks around for the life of the ProcessProfanity object?

Am i wrong in expecting to be able to PO the array from the breakpoint within the block?

Gareth Jeanne
  • 1,410
  • 2
  • 19
  • 35
  • Where exactly is your code crashing? – Daniel Galasko Mar 25 '14 at 08:06
  • Hi Daniel, i have updated the question with some extra info i have managed to work out overnight. I am no further forward, but the information might help? – Gareth Jeanne Mar 25 '14 at 08:45
  • In your init method, perhaps you should try changing self.badWords = [words copy]; My understanding is that declaring a property as a copy only means that anyone accessing that property through the getter will be returned a copy. You should probably change to (nonatomic,strong) since it is a private variable and those rules don't apply... – Daniel Galasko Mar 25 '14 at 09:03
  • Hi Daniel, please see Edit 1/2 above, i have tried those things, but to no avail. – Gareth Jeanne Mar 25 '14 at 09:59
  • What concerns me is that the error you are getting is referring to core data yet you are pointing at an array that is being incorrectly released? I'm a bit unsure of the issue. – Daniel Galasko Mar 25 '14 at 10:37
  • I also don't see why you use indexOfObjectWithOptions as this returns an NSUInteger yet you are not using the result. Try instead use enumerateObjectsUsingBlock or the for-in – Daniel Galasko Mar 25 '14 at 10:40
  • Did you manage to work it out? – Daniel Galasko Mar 25 '14 at 15:25
  • Hi Daniel, yeah i worked it out, well kinda. I did post the update above but it seems to have disappeared, will try posting it again. – Gareth Jeanne Mar 25 '14 at 16:30
  • Thats awesome bud. One word of advice, you might want to look into checking your tweets for profanity when they are added and not necessarily all at once. Secondly your fetch for tweets should probably use a predicate where profanity = NO ? – Daniel Galasko Mar 26 '14 at 06:52
  • Hi Daniel, thanks for your help. I had modified things a bit since the code above. I have added a predicate that filters the core data fetch to only retrieve objects that have not had their profanity value set yet, so i'm only processing those. I was going to process as they're added, but this happens to quickly that it doesn't really make any difference. If performance becomes an issue then i will certainly look at doing that. Wouldn't be particularly difficult to modify/add a method to this to do that now i have the core data stuff worked out :) – Gareth Jeanne Mar 26 '14 at 08:34
  • Thats good to hear man. One last word of advice, since your profanity words are unique you should store them in a set. this way you drastically improve your performance since the query on a set is radically fast. Rather than iterating through every item in the array u would just check the set once and let the framework give you that great performance. – Daniel Galasko Mar 26 '14 at 10:30
  • If you don't mind after work I would like to write out my recommendations and present a nice clean answer to this post? – Daniel Galasko Mar 26 '14 at 10:34
  • Yup sounds good to me, happy to accept. – Gareth Jeanne Mar 26 '14 at 12:53
  • There, i created an answer and also elaborated on the crash which is technically the reason for your question:) – Daniel Galasko Mar 26 '14 at 15:53

2 Answers2

4

In this instance the error message "error: NULL _cd_rawData but the object is not being turned into a fault" indicates that you are accessing a managed object outside of its context. Basically your fetch returns all the Tweets from your persistent store as faults. Once you try and access a property on the Managed Object, Core Data will fire a fault and fetch the full object from the store.

By calling the NSArray method indexOfObjectWithOptions:passingTest: with an option of NSEnumerationConcurrent you are implying that you want to perform asynchronous execution on the elements in your array. The keyword concurrent indicates that multiple threads can be used to operate on the array elements.

In your context this means that accessing a managed object inside this block might result in accessing it on a different thread from the managed object context that owns the object. So when you access tweetToCheck.text in your conditional check - if ([tweetToCheck.text rangeOfString:obj].location != NSNotFound), under the hood Core Data is fetching that managed object from the persistent store and returning it to a thread that is not part of the managed object contexts thread.

Furthermore, it is not necessary to use the method indexOfObjectWithOptions:passingTest: since you are not actually interested in the result of this operation.

It seems to me that it might be more convenient for you to use an NSSet as you are only testing to see whether or not a given tweet word exists in your profane words. Quoting the documentation for NSSet: "You can use sets as an alternative to arrays when the order of elements isn’t important and performance in testing whether an object is contained in the set is a consideration". Clearly this seems to meet your criteria.

So your init would look like:

 -(id)initWithStore:(NSPersistentStoreCoordinator*)store 
           badWords:(NSSet*)badWords
{
   self = [super init];
   if(self) {
     self.persistentStoreCoordinator = store;
     self.badWords = [words copy];
   }
   return self;
}

Since you are only interested in updating tweets that have not yet been tagged for profanity you would probably only want to fetch tweets that haven't been flagged profane:

//Create new fetch request
NSFetchRequest *request = [[NSFetchRequest alloc] init];

//Setup the Request
[request setEntity:[NSEntityDescription entityForName:@"Tweet" inManagedObjectContext:self.backgroundContext]];
[request setPredicate:[NSPredicate predicateWithFormat:@"profanity = NO"]];

Now that you have an array of tweets that are not profane you could iterate through your tweets and check each word if it contains a profane word. The only thing you will need to deal with is how to separate your tweet into words (ignoring commas and exclamation marks etc). Then for each word you are going to need to strip it of diacritics and probably ignore the case. So you would end up with someone along the lines of:

if([self.badWords containsObject:badWordString]) {
    currentTweet.profanity = [NSNumber numberWithBOOL:YES];
}

Remember, you can run predicates on an NSSet so you could actually perform a case and diacritic insensitive query:

NSPredicate *searchPredicate = [NSPredicate predicateWithFormat:@"SELF = %@[cd]",wordToCheck];
BOOL foundABadWord = ([[[self.badWords filteredSetUsingPredicate:searchPredicate] allObjects] count] > 0);

Another thing you might want to consider is removing duplicate words in your tweets, you don't really want to perform the same check multiple times. So depending on how you find the performance you could place each word of your tweet into an NSSet and simply run the query on the unique words in your tweet:

if([[self.badWords intersectsSet:tweetDividedIntoWordsSet]) {
    //we have a profane tweet here!
}

Which implementation you choose is up to you but assuming you are only using english in your app you are definitely going to want to run a case and diacritic insensitive search.

EDIT

One final thing to note is that no matter how much you try, people will always be the best means of detecting profane or abusive language. I encourage you to read this SO's post on detecting profanity - How do you implement a good profanity filter?

Community
  • 1
  • 1
Daniel Galasko
  • 23,617
  • 8
  • 77
  • 97
  • Wow, very complete and thorough explanation, i will certainly look at implementing the NSSet function as it does look like it would improve the performance, especially once the array gets larger. Thanks! – Gareth Jeanne Mar 26 '14 at 22:59
  • You might want to also take a look at the link I appended to my post , I found a very insightful discussion on detecting profanity programmatically. Good luck.:) – Daniel Galasko Mar 27 '14 at 07:34
1

Ok, so still not quite sure what was going on, but i followed Daniels advice and re-wrote the indexOfObjectWithOptions method and now it's working. For completeness, and so it hopefully helps someone else, this is what i ended up doing.

    DDLogInfo(@"Processing posts to check for bad language");
for (Tweet* tweetToCheck in tweetsToProcess){
    __block NSArray *array = [[NSArray alloc] initWithArray:self.badWords copyItems:YES];
    __block NSString *result = nil;

    NSRange tmprange;
    for(NSString *string in array) {
        tmprange = [tweetToCheck.text rangeOfString:[NSString stringWithFormat:@" %@ ", string]];
        if (tmprange.location != NSNotFound) {
            result = string;
            DDLogVerbose(@"Naughty Word Found: %@", string);
            break;
        }
    }

    if (!result){
        //DDLogVerbose(@"The post does not contain any of the words from the naughty list");
        if(tweetToCheck){
            tweetToCheck.profanity = [NSNumber numberWithBool:false];
        }
    }
    else{
        if(tweetToCheck){
            //DDLogVerbose(@"The string contains '%@' from the the naughty list", result);
            tweetToCheck.profanity = [NSNumber numberWithBool:true];
        }
    }
Gareth Jeanne
  • 1,410
  • 2
  • 19
  • 35