With ARC, what's better: alloc or autorelease initializers?

Question

Is it better (faster & more efficient) to use alloc or autorelease initializers. E.g.:

- (NSString *)hello:(NSString *)name {
    return [[NSString alloc] initWithFormat:@"Hello, %@", name];
}

OR

- (NSString *)hello:(NSString *)name {
    return [NSString stringWithFormat:@"Hello, %@", name];
//    return [@"Hello, " stringByAppendingString:name]; // even simpler
}

I know that in most cases, performance here shouldn't matter. But, I'd still like to get in the habit of doing it the better way.

If they do exactly the same thing, then I prefer the latter option because it's shorter to type and more readable.

In Xcode 4.2, is there a way to see what ARC compiles to, i.e., where it puts retain, release, autorelease, etc? This feature would be very useful while switching over to ARC. I know you shouldn't have to think about this stuff, but it'd help me figure out the answer to questions like these.

@dasdom: Apple has already publicly disclosed ARC, for example at http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-June/015588.html. So the NDA no longer applies. Also Lion was released yesterday, so again the NDA no longer applies. — Anomie, Jul 21 '11 at 13:22
As far as I understand it, both should result in the same performance. I think this was one of the goals. But I haven't tested it. — dasdom, Jul 21 '11 at 13:27
Even if you could, trying to imitate ARC's placement of reference counting activity yourself is not necessarily desirable. LLVM aggressively optimizes its output as long as it can prove it is safe. Your brain does not have the same code-flow analysis tools to make that kind of decision. — Chuck, Jul 21 '11 at 14:45
As the answers point out, there are different implications for what gets left for how long in the autorelease pool. I feel that somebody should mention, though, that in 99% of cases it really won't make any difference to your code (speed nor memory footprint). Personally, I prefer the second version because it works without ARC too AND it's more readable. Great question, though, +1 — Dan Rosenstark, Jun 28 '12 at 19:47

score 37 · Accepted Answer · answered Aug 09 '11 at 06:22

The difference is subtle, but you should opt for the autorelease versions. Firstly, your code is much more readable. Secondly, on inspection of the optimized assembly output, the autorelease version is slightly more optimal.

The autorelease version,

- (NSString *)hello:(NSString *)name {
    return [NSString stringWithFormat:@"Hello, %@", name];
}

translates to

"-[SGCAppDelegate hello:]":
    push    {r7, lr}
    movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
    mov r3, r2
    movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC0_0+4))
    movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
    movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC0_1+4))
    add r1, pc
    add r0, pc
    mov r7, sp
    ldr r1, [r1]
    ldr r0, [r0]
    movw    r2, :lower16:(L__unnamed_cfstring_-(LPC0_2+4))
    movt    r2, :upper16:(L__unnamed_cfstring_-(LPC0_2+4))
    add r2, pc
    blx _objc_msgSend    ; stringWithFormat:
    pop {r7, pc}

Whereas the [[alloc] init] version looks like the following:

"-[SGCAppDelegate hello:]":
    push    {r4, r5, r6, r7, lr}
    movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC1_0+4))
    add r7, sp, #12
    movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC1_0+4))
    movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
    movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
    add r1, pc
    add r0, pc
    ldr r5, [r1]
    ldr r6, [r0]
    mov r0, r2
    blx _objc_retain    ; ARC retains the name string temporarily
    mov r1, r5
    mov r4, r0
    mov r0, r6
    blx _objc_msgSend   ; call to alloc
    movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_4-(LPC1_2+4))
    mov r3, r4
    movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_4-(LPC1_2+4))
    add r1, pc
    ldr r1, [r1]
    movw    r2, :lower16:(L__unnamed_cfstring_-(LPC1_3+4))
    movt    r2, :upper16:(L__unnamed_cfstring_-(LPC1_3+4))
    add r2, pc
    blx _objc_msgSend   ; call to initWithFormat:
    mov r5, r0
    mov r0, r4
    blx _objc_release   ; ARC releases the name string
    mov r0, r5
    pop.w   {r4, r5, r6, r7, lr}
    b.w _objc_autorelease

As expected, it is a little longer, because it is calling the alloc and initWithFormat: methods. What is particularly interesting is ARC is generating sub-optimal code here, as it retains the name string (noted by call to _objc_retain) and later released after the call to initWithFormat:.

If we add the __unsafe_unretained ownership qualifier, as in the following example, the code is rendered optimally. __unsafe_unretained indicates to the compiler to use primitive (copy pointer) assignment semantics.

- (NSString *)hello:(__unsafe_unretained NSString *)name {
    return [[NSString alloc] initWithFormat:@"Hello, %@", name];
}

as follows:

"-[SGCAppDelegate hello:]":
    push    {r4, r7, lr}
    movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC1_0+4))
    add r7, sp, #4
    movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_2-(LPC1_0+4))
    movw    r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
    movt    r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_-(LPC1_1+4))
    add r1, pc
    add r0, pc
    mov r4, r2
    ldr r1, [r1]
    ldr r0, [r0]
    blx _objc_msgSend
    movw    r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_4-(LPC1_2+4))
    mov r3, r4
    movt    r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_4-(LPC1_2+4))
    add r1, pc
    ldr r1, [r1]
    movw    r2, :lower16:(L__unnamed_cfstring_-(LPC1_3+4))
    movt    r2, :upper16:(L__unnamed_cfstring_-(LPC1_3+4))
    add r2, pc
    blx _objc_msgSend
    .loc    1 31 1
    pop.w   {r4, r7, lr}
    b.w _objc_autorelease

Since `stringWithFormat:` itself calls `[[self alloc] initWithFormat:...]`, it would be more efficient to avoid the extra method dispatch by calling it directly. Less code compiled into your binary doesn't necessarily mean more efficient. — rpetrich, Aug 29 '11 at 02:56
Absolutely, unrolling loops is an example of more code is better. However, as I mentioned above, calling `[[alloc] initWithFormat:]` using code most developers would instinctively write generated unusual, and sub-optimal code. As with anything performance related, profile...profile...profile... — Stuart Carnie, Aug 29 '11 at 06:54
Returning an autoreleased object (assuming the ARC compiler doesn't optimise out the autorelease in your particular context) means that the object will be released *later*, so doesn't that mean that comparing the assembly at the point of object creation is somewhat moot? — Nick Forge, Jan 06 '12 at 00:04
Was wondering the same thing. I'm perplexed that the former autorelease initializer doesn't add autorelease, but the alloc/init style now became autoreleasing under ARC. Is this true or am I being mislead because the release code isn't shown? — CodeSmile, May 31 '12 at 16:12
@LearnCocos2D, this is related to ARC, so there is no release code — Stuart Carnie, Jun 08 '12 at 16:38
The autorelease version is **less** optimal if you account for the time to clean out the autorelease pool. — Tammo Freese, Jun 30 '12 at 23:51
Not necessarily true, @Tammo Freese. To quote Apple (and I heard this from Chris Lattner and the LLVM team): “Use the new @autoreleasepool{} construct instead. This forces a block structure on your autorelease pool, and is about six times faster than NSAutoreleasePool.”. The compiler manages autorelease object more optimally using ARC. — Stuart Carnie, Jul 02 '12 at 04:43
@StuartCarnie You are right that `@autoreleasepool` is faster than `NSAutoreleasePool`. My point was that the version that does not add objects to the pool is faster than the one adding objects to the pool. I measured this both on iOS and Mac and have added an answer below describing that. `[NSString stringWithFormat:]` would match the performance of `[[NSString alloc] initWithFormat:]` if the ARC optimization would prevent the object from being added to the pool (via objc_autoreleaseReturnValue and objc_retainAutoreleasedReturnValue), but that optimization does not work in this case yet. — Tammo Freese, Jul 02 '12 at 15:37
So this is true of `NSString`, but would it be the same story with `NSMutableArray` or `NSDictionary`, for example? Are the convenience methods more efficient in general? — jowie, Jul 16 '12 at 09:34
The convenience methods are in general *slower* if you count in the time for cleaning up the autorelease pool. Even if the ARC optimization kicks in which prevents objects to the autorelease pool, it is still slower than alloc init. — Tammo Freese, Jan 30 '13 at 10:21
There is one more subtle different in object lifetime. The alloc / init version *will* be set for deallocation at the end of the scope, while the autorelease version will be alive until the next pool drain (whenever that is). — borrrden, Jun 13 '13 at 03:25

score 10 · Answer 2 · edited Jun 20 '20 at 09:12

[NSString stringWithFormat:] is less code. But be aware that the object may end up in the autorelease pool. And that currently happens even with ARC and -Os compiler optimization.

Currently the performance of [[NSString alloc] initWithFormat:] is better on both iOS (tested with iOS 5.1.1 and Xcode 4.3.3) and OS X (tested with OS X 10.7.4 and Xcode 4.3.3). I modified @Pascal's sample code to include the autorelease pool drain times and got the following results:

The ARC optimization does not prevent the objects to end up in the autorelease pool.
Including time for clearing out the release pool with 1 million objects, [[NSString alloc] initWithFormat:] is around 14% faster on iPhone 4S, and around 8% faster on OS X
Having an @autoreleasepool around the loop releases all objects at the and of the loop, which eats up a lot of memory.
The memory spikes can be prevented by using an @autoreleasepool inside the loop. The performance stays roughly the same, but the memory consumption then is flat.

score 4 · Answer 3 · answered Jan 18 '12 at 20:26

I disagree with the other answers, the autorelease version (your 2nd example) is not necessarily better.

The autorelease version behaves just as is it did before ARC. It allocates and inits and then autoreleases, which means the pointer to the object needs to be stored to be autoreleased later the next time the autorelease pool is drained. This uses slightly more memory as the pointer to that object needs to be kept around until it is processed. The object also sticks around longer than if it was immediately released. This can be an issue if you are calling this many times in a loop so the autorelease pool would not have a chance to be drained. This could cause you to run out of memory.

The first example behaves differently than it did before ARC. With ARC, the compiler will now insert a "release" for you (NOT an autorelease like the 2nd example). It does this at the end of the block where the memory is allocated. Usually this is at the end of the function where it is called. In your example, from viewing the assembly, it seems like the object may in fact be autoreleased. This might be due to the fact the compiler doesn't know where the function returns to and thus where the end of the block is. In the majority of the cases where a release is added by the compiler at the end of a block, the alloc/init method will result in better performance, at least in terms of memory usage, just as it did before ARC.

score 3 · Answer 4 · answered Nov 04 '11 at 02:40

Well, this is something easy to test, and indeed it seems the convenience constructor is "faster" -- unless I made some error in my test code, see below.

Output (Time for 1 Million constructions)

Alloc/init:   842.549473 millisec
Convenience:  741.611933 millisec
Alloc/init:   799.667462 millisec
Convenience:  741.814478 millisec
Alloc/init:   821.125221 millisec
Convenience:  741.376502 millisec
Alloc/init:   811.214693 millisec
Convenience:  795.786457 millisec

Script

#import <Foundation/Foundation.h>
#import <mach/mach_time.h>

int main (int argc, const char * argv[])
{

    @autoreleasepool {
        NSUInteger runs = 4;

        mach_timebase_info_data_t timebase;
        mach_timebase_info(&timebase);
        double ticksToNanoseconds = (double)timebase.numer / timebase.denom;

        NSString *format = @"Hello %@";
        NSString *world = @"World";

        NSUInteger t = 0;
        for (; t < 2*runs; t++) {
            uint64_t start = mach_absolute_time();
            NSUInteger i = 0;
            for (; i < 1000000; i++) {
                if (0 == t % 2) {       // alloc/init
                    NSString *string = [[NSString alloc] initWithFormat:format, world];
                }
                else {                  // convenience
                    NSString *string = [NSString stringWithFormat:format, world];
                }
            }
            uint64_t run = mach_absolute_time() - start;
            double runTime = run * ticksToNanoseconds;

            if (0 == t % 2) {
                NSLog(@"Alloc/init:   %.6f millisec", runTime / 1000000);
            }
            else {
                NSLog(@"Convenience:  %.6f millisec", runTime / 1000000);
            }
        }
    }
    return 0;
}

The autoreleased objects aren't going to be released until after the benchmarking (when your `@autoreleasepool` block ends), so aren't you actually comparing `alloc+init+dealloc` to `alloc+init+autorelease`? — Nick Forge, Jan 05 '12 at 23:58
That's a very valid point! Should rewrite the test to drain the pool before taking the stop-time... — Pascal, Jan 06 '12 at 23:56
If you include the drain time for the autorelease pool, the convenience constructor is actually **slower**, not faster. That makes sense, as adding an object to the autorelease pool is more expensive than just a retain release. And at least for `+[NSString stringWithFormat:]` the ARC optimization does **not** prevent the objects from being added to the autorelease pool. — Tammo Freese, Nov 25 '12 at 17:21

score 1 · Answer 5 · answered May 02 '14 at 16:22

Comparing the performance of the two is a bit of a moot issue for a couple of reasons. First, the performance characteristics of the two might change as Clang evolves, and new optimisations are added to the compiler. Second, the benefits of skipping a few instructions here and there are dubious at best. The performance of your app should be considered across method boundaries. Deconstructing one method can be deceiving.

score 0 · Answer 6 · answered Aug 08 '11 at 21:13

I think that the stringWithFormat: implementation is actually implemented just as your 1st version, which means nothing should change. In any case, if there is any difference, it probably seems as the second version should not be slower. Finally, in my opinion the second version is slightly more readable, so that's what I'd use.

With ARC, what's better: alloc or autorelease initializers?

6 Answers6

Linked

Related