1

I am calling strtoull() over 100 million times in my command line Objective-C OS X app calculating Hamming distance. I have tracked down a ~30 byte/call memory leak to this function call from ph_hamming_distance(). I have looked at the BSD source for strtoull() and even chopped out the generality I don't need and put the source inside my app but still have the memory leak.

The calling code is:

    NSArray * returnMatchedImagesFromDB(NSString * hash, NSString * asin, NSInteger action) {

        /*  Input hash, asin, action(not yet used)
         *  Calculate Hamming Distance to all records in DB
         *  Return NSArray of HammingDistanceRecords of matches within "hdCompareThreshold" of each other
         */ 
        int  hd;
        int threshold = 0;
        NSMutableArray * retArray = [[NSMutableArray alloc] init];

        threshold = hdCompareThreshold;

        // for each image in dbImageArray, compute hamming distance to all other images
        for (ImageRecord *imgRecord in dbImageArray) {
            hd = ph_hamming_distance(imgRecord.hash, hash);
            if ((threshold == -1) || (hd <= threshold)) {
                HammingDistanceRecord * hdRec = [[HammingDistanceRecord alloc] init];
                hdRec.hammingDistance = hd;
                hdRec.asin1 = asin;
                hdRec.asin2 = imgRecord.asin;
                hdRec.rank2 = imgRecord.rank;
                [retArray addObject:hdRec];
            }
        }
        return [retArray copy];
    }   // returnMatchedImagesFromDB()

int ph_hamming_distance(NSString * hashStr1,NSString * hashStr2) {

            NSUInteger hash1 = strtoull([hashStr1 UTF8String],NULL,0);
            NSUInteger hash2 = strtoull([hashStr2 UTF8String],NULL,0);
            NSUInteger x = hash1^hash2;
            const NSUInteger m1  = 0x5555555555555555ULL;
            const NSUInteger m2  = 0x3333333333333333ULL;
            const NSUInteger h01 = 0x0101010101010101ULL;
            const NSUInteger m4  = 0x0f0f0f0f0f0f0f0fULL;
            x -= (x >> 1) & m1;
            x = (x & m2) + ((x >> 2) & m2);
            x = (x + (x >> 4)) & m4;
            return (x * h01)>>56;
        }

The arguments to ph_hamming_distance() are always base10 (with no alpha chars). Typical hashStr is @"17609976980814024116". The database of objects I am comparing is currently at 390K objects so an internal compare of all the objects against themselves is 300 billion calls to strtoull(). The leak is causing my app to SIGKILL -9 at ~3500 compares every time. This is 3500*390K*2 calls/compare = ~80 GB which is my free space on my drive, so I guess OS X is killing the process when the swapfile fills up the drive.

Any help appreciated.

rick
  • 51
  • 6
  • 1
    My guess is that it is something more than `strtoull`. Can you show your loop that is performing the calls to `ph_hamming_distance` as well? – Alden Apr 17 '17 at 19:41

1 Answers1

1

It could be your [hashStr1 UTF8String] call, this would allocate a char* buffer that won't get released until your autorelease context cleans up, which could be "never" if you're calling all of this in a loop without returning back up to your NSRunLoop. See for example What is the guaranteed lifecycle of -[NSString UTF8String]?

Community
  • 1
  • 1
faffaffaff
  • 3,429
  • 16
  • 27
  • But a command line program doesn't have an NSRunloop, does it? Maybe I'll try creating an explicit char * p1 = [hashStr1 UTF8String], send p to strtoull(p, NULL, 0), and then do p = nil. Should that allow the buffer to be released? Seems like this issue should have been hit by lots of people sending strings to some f(string). – rick Apr 17 '17 at 23:07
  • 1
    p = nil won't free anything. See if you can find a way to manually drain the autoreleasepool. Maybe you can put an `@autorelease { ... }` block around your inner loop? See https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html – faffaffaff Apr 17 '17 at 23:33
  • 1
    @rick That's the problem; a command line program won't have a runloop and you are calling code that either requires a runloop or manual management of the autorelease pool. – bbum Apr 18 '17 at 01:45
  • 1
    @faffaffaff Your autorelease block suggestion did the trick! Memory size constant at 95MB after processing 25,000 records. This in fact is a solution for my other command line programs that were growing to huge footprints. Thank you!! Maybe I should read Apple docs more and not limit my searches to StackOverflow :-) Sorry I don't have enough points to vote up your suggestion. – rick Apr 18 '17 at 03:06