Reduce Peak Memory Usage With @autoreleasepool

Question

I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.

For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.

Each of the examples below are contained within the following code block:

int main (int argc, const char * argv[])
{
    @autoreleasepool {
        NSDate *now = [NSDate date];

        //Code Example ...

        NSTimeInterval interval = [now timeIntervalSinceNow];
        printf("Duration: %f\n", interval);
    }
}

NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)

    static const NSUInteger BATCH_SIZE = 1500;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
        [text class];

        if((count + 1) % BATCH_SIZE == 0)
        {
            [pool drain];
            pool = [[NSAutoreleasePool alloc] init];
        }
    }
    [pool drain];

Run Times:
10.928158
10.912849
11.084716

Outer @autoreleasepool (Peak Memory: ~382 MB)

    @autoreleasepool {
        for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
11.489350
11.310462
11.344662

Inner @autoreleasepool (Peak Memory: ~61.2KB)

    for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
    {
        @autoreleasepool {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }

Run Times:
14.031112
14.284014
14.099625

@autoreleasepool w/ goto (Peak Memory: ~115KB)

    static const NSUInteger BATCH_SIZE = 1500;
    uint32_t count = 0;

    next_batch:
    @autoreleasepool {
        for(;count < MAX_ALLOCATIONS; count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
            if((count + 1) % BATCH_SIZE == 0)
            {
                count++; //Increment count manually
                goto next_batch;
            }
        }
    }

Run Times:
10.908756
10.960189
11.018382

The goto statement offered the closest performance, but it uses a goto. Any thoughts?

Update:

Note: The goto statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.

On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.

So that `goto` is definitely not, I don't know, causing a memory leak? Everything else makes sense: less draining is faster. Anyway, I can only comment on readability: anywhere you pool is fine. That goto would need a yellow sticky note. — Dan Rosenstark, Mar 12 '12 at 21:58
The goto did not seem to leak any memory. Looks like the scope drained the autorelease pool as I expected but I am no expert on ARC (yet) and do not want to rely on UB. — Joe, Mar 12 '12 at 22:07
can't you do the same thing by inverting your code and putting the autorelease pool INSIDE the `for` that checks your batch size? Obviously `count` would have to start from where it last left off... — Dan Rosenstark, Mar 12 '12 at 23:20
@Yar Thanks, lack of sleep has me overcomplicating things again. — Joe, Mar 13 '12 at 00:10
I slept almost 36 hours before writing that comment, so I had a definite advantage. — Dan Rosenstark, Mar 13 '12 at 04:42

score 9 · Accepted Answer · answered Mar 13 '12 at 00:03

9

The following should achieve the same thing as the goto answer without the goto:

for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
    @autoreleasepool
    {
        for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
        {
            NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
            [text class];
        }
    }
}

answered Mar 13 '12 at 00:03

ipmcc

29,581
5
84
147

Thanks, long day, simple enough answer :). – Joe Mar 13 '12 at 00:07
1

This should be up voted more as it's exactly the solution to get the same behaviour as manual draining with the old `NSAutoreleasePool`! – mattjgalloway Mar 26 '12 at 16:55

BJ Homer · Answer 2 · 2012-03-13T21:32:25.683

Note that ARC enables significant optimizations which are not enabled at -O0. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".

Run your tests again with optimizations and see what happens.

Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.

arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104

arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470

arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007

arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098

And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):

Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB

These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between inner and withGoto, even with optimizations enabled, is higher than what I would anticipate.

Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.

(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the goto version.)

Thanks, appreciate the tip. Wasn't aware that additional steps were taken but makes absolute since. Is there an optimization guide similar to http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html ? Not expecting to find exact flags but more along the lines of what kind of optimizations may take place. I found some useful attribute items [here](http://clang.llvm.org/docs/AutomaticReferenceCounting.html#optimization) but it does not look like it is compiler optimization dependent. — Joe, Mar 13 '12 at 18:01
Thanks for your assistance in checking the statistics with optimizations. This will be useful for deciding which method to use in the future for various purposes (excluding the `goto` thanks to ipmcc's answer). Hopefully some more people will come through and upvote. — Joe, Mar 13 '12 at 21:23

Reduce Peak Memory Usage With @autoreleasepool

2 Answers2

Linked