11

I have an NSMutableArray that contains a few custom objects. Two of the objects have the same properties such as title and author. I want to remove the duplicate object and leave the other.

Asset *asset;
NSMutableArray *items = [[[NSMutableArray alloc] init] autorelease];

// First
asset = [[Asset alloc] init];
asset.title = @"Developer";
asset.author = @"John Smith";
[items addObject:asset];
[asset release];

// Second
asset = [[Asset alloc] init];
asset.title = @"Writer";
asset.author = @"Steve Johnson";
[items addObject:asset];
[asset release];

// Third
asset = [[Asset alloc] init];
asset.title = @"Developer";
asset.author = @"John Smith";
[items addObject:asset];
[asset release];

Since they are NOT the same object, but only having duplicate properties, how can I remove the duplicate?

Nic Hubbard
  • 41,587
  • 63
  • 251
  • 412

5 Answers5

13

You could create a HashSet and as you loop, you could add "title+author" concatenated set to the HashSet (NSMutableSet). As you arrive at each item, if the HashSet contains your key, either remove it or don't copy (either deleting or creating a copy without duplicates).

That makes it order n (1 loop)

Here's the NSMutableSet class:

http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSMutableSet_Class/Reference/NSMutableSet.html#//apple_ref/occ/cl/NSMutableSet

EDIT with code:

The meat of the code is the one loop.

void print(NSMutableArray *assets)
{
    for (Asset *asset in assets)
    {
        NSLog(@"%@/%@", [asset title], [asset author]);
    }
}

int main (int argc, const char * argv[])
{

    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    //
    // Create the initial data set
    //
    Asset *asset;
    NSMutableArray *items = [[[NSMutableArray alloc] init] autorelease];

    // First
    asset = [[Asset alloc] init];
    asset.title = @"Developer";
    asset.author = @"John Smith";
    [items addObject:asset];
    [asset release];

    // Second
    asset = [[Asset alloc] init];
    asset.title = @"Writer";
    asset.author = @"Steve Johnson";
    [items addObject:asset];
    [asset release];

    // Third
    asset = [[Asset alloc] init];
    asset.title = @"Developer";
    asset.author = @"John Smith";
    [items addObject:asset];
    [asset release];

    NSLog(@"****Original****");
    print(items);

    //
    // filter the data set in one pass
    //
    NSMutableSet *lookup = [[NSMutableSet alloc] init];
    for (int index = 0; index < [items count]; index++)
    {
        Asset *curr = [items objectAtIndex:index];
        NSString *identifier = [NSString stringWithFormat:@"%@/%@", [curr title], [curr author]];

        // this is very fast constant time lookup in a hash table
        if ([lookup containsObject:identifier])
        {
            NSLog(@"item already exists.  removing: %@ at index %d", identifier, index);
            [items removeObjectAtIndex:index];
        }
        else
        {
            NSLog(@"distinct item.  keeping %@ at index %d", identifier, index);
            [lookup addObject:identifier];
        }
    }

    NSLog(@"****Filtered****");
    print(items);

    [pool drain];
    return 0;
}

Here's the output:

Craplet[11991:707] ****Original****
Craplet[11991:707] Developer/John Smith
Craplet[11991:707] Writer/Steve Johnson
Craplet[11991:707] Developer/John Smith
Craplet[11991:707] distinct item.  keeping Developer/John Smith at index 0
Craplet[11991:707] distinct item.  keeping Writer/Steve Johnson at index 1
Craplet[11991:707] item already exists.  removing: Developer/John Smith at index 2
Craplet[11991:707] ****Filtered****
Craplet[11991:707] Developer/John Smith
Craplet[11991:707] Writer/Steve Johnson
bryanmac
  • 38,941
  • 11
  • 91
  • 99
  • So you are saying add an add an NSString object to the NSMutableSet then on the next loop check of that same string exists? – Nic Hubbard Sep 20 '11 at 21:28
  • Yes - in one pass, you're trying to see if you've already seen a similar items. You have to identify what similar means (in this case title + author). The NSSet offers very fast lookup (using a hash internally) to answer that question quickly. The one downside is extra memory for the distinct set of title+author combos. Typical space vs. time. – bryanmac Sep 20 '11 at 21:32
  • Ok, still confused how I then do the lookup with the NSSet. Could you offer a quick code example? – Nic Hubbard Sep 20 '11 at 21:39
  • I didn't notice before I posted that NSSet containsObject is constant time, O(1)! That's actually awesome. I was going to dispute that this algorithm is linear, but if containsObject is truly constant time, then yeah it is. Well done, @bryanmac! – devdavid Sep 20 '11 at 22:31
  • 2
    Awesome! I needed to add "index --" when i removed object – Damien Romito May 27 '14 at 04:55
4

You can use the uniqueness of an NSSet to get distinct items from your original array. If you have the source code for Assest you will need to override the hash and isEqual: method on the Asset class.

@interface Asset : NSObject
@property(copy) NSString *title, *author;
@end

@implementation Asset
@synthesize title, author;

-(NSUInteger)hash
{
    NSUInteger prime = 31;
    NSUInteger result = 1;

    result = prime * result + [self.title hash];
    result = prime * result + [self.author hash];

    return result;
}

-(BOOL)isEqual:(id)object
{
    return [self.title isEqualToString:[object title]] && 
    [self.author isEqualToString:[object author]];
}

- (void)dealloc {
    [title release];
    [author release];
    [super dealloc];
}

@end

Then to implement:

Asset *asset;
NSMutableArray *items = [[[NSMutableArray alloc] init] autorelease];

// First
asset = [[Asset alloc] init];
asset.title = @"Developer";
asset.author = @"John Smith";
[items addObject:asset];
[asset release];

// Second
asset = [[Asset alloc] init];
asset.title = @"Writer";
asset.author = @"Steve Johnson";
[items addObject:asset];
[asset release];

// Third
asset = [[Asset alloc] init];
asset.title = @"Developer";
asset.author = @"John Smith";
[items addObject:asset];
[asset release];

NSLog(@"Items: %@", items);

NSSet *distinctItems = [NSSet setWithArray:items];

NSLog(@"Distinct: %@", distinctItems);

And if you need an array at the end you can just call [distinctItems allObjects]

Community
  • 1
  • 1
Joe
  • 56,979
  • 9
  • 128
  • 135
2

First, I'd override the isEqual: method for Asset like this:

-(BOOL)isEqual:(Asset *)otherAsset {
    return [self.title isEqual:otherAsset.title] && [self.author isEqual:otherAsset.author];
}

Then, if you want to avoid placing duplicates in the array in the first place:

NSUInteger idx = [items indexOfObject:asset];  // tests objects for equality using isEqual:
if (idx == NSNotFound) [items addObject:asset];

If the array already contains duplicates, then any algorithm that finds them has a run time already worse than linear, but I think creating a new array and only adding unique elements like above is the best algorithm. Something like this:

NSMutableArray *itemsWithUniqueElements = [NSMutableArray arrayWithCapacity:[items count]];

for (Asset *anAsset in items) {
    if ([items indexOfObject:anAsset] == NSNotFound)
        [itemsWithUniqueElements addObject:anAsset];
}

[items release];
items = [itemsWithUniqueElements retain];

In the worst case scenario (all elements are already unique) the number of iterations is:

1 + 2 + 3 + ... + n =  n * (n+1) / 2

Which is still O(n^2) but is slightly better than @Justin Meiners' algorithm. No offense! :)

devdavid
  • 1,571
  • 10
  • 14
  • It can be solved linear O(n). Because a hash table lookup amortizes to O(1), with extra space, you can do it in one loop. Typical time vs. Space. – bryanmac Sep 20 '11 at 22:27
  • One more thing to add to that solution. You may use set instead of array after overriding -(BOOL)isEqual: method. then when you trying to add to set "duplicate" object it will not be added until you create your set as set that can contain duplicates... – Ariel Sep 20 '11 at 22:46
  • @Ariel: You'll have to override `hash` as well if you want to use custom items in a set. – jscs Sep 20 '11 at 22:54
  • I've missed that. Haven't worked much with sets... :) Thanks for correcting. – Ariel Sep 20 '11 at 23:01
  • The point about overriding hash is a good one. I thought about going down that road (most correct) but opted for the simple concat string hashed for simplicity. – bryanmac Sep 21 '11 at 03:27
  • That's really great, I had no idea sets got you O(1) lookup. I read a bit about it, and that you have to override hash as well, but how easy is it to write a hash function on a custom object that acts properly? That's the part that seemed like it might be hard, but I haven't dug into it yet. – devdavid Sep 21 '11 at 15:23
0

If you'd like your custom NSObject subclasses to be considered equal when their names are equal you may implement isEqual: and hash. This will allow you to add of the objects to an NSSet/NSMutableSet (a set of distinct objects).

You may then easily create a sorted NSArray by using NSSet's sortedArrayUsingDescriptors:method.

MikeAsh wrote a pretty solid piece about implementing custom equality: Friday Q&A 2010-06-18: Implementing Equality and Hashing

jessecurry
  • 22,068
  • 8
  • 52
  • 44
  • Why did you post this answer twice? Flag duplicates as such when you find them rather than posting the same thing repeatedly. – Mat May 02 '13 at 13:47
  • Did I post it twice? I meant to post it in only once, could you flag one as a duplicate? – jessecurry May 02 '13 at 19:33
0

This is one way you could do it :

NSMutableArray* toRemove = [NSMutableArray array];

    for (Asset* asset1 in items)
    {
        for (Asset* asset2 in items)
        {
            if (asset1 != asset2)
            {
                if ([asset1.title isEqualToString:asset2.title] && [asset1.author isEqualToString:asset2.author])
                {
                    [toRemove addObject:asset2];
                }
            }
        }
    }

    for (Asset* deleted in toRemove)
    {
        [items removeObject:toRemove];
    }
Justin Meiners
  • 10,754
  • 6
  • 50
  • 92
  • 2
    OK but that's an n^2 algorithm which should generally be avoided. – bryanmac Sep 20 '11 at 21:17
  • You havent seen much code then :P It's just logical you check every item against every other item and compare the two. The reason why I have a toRemove is so I dont modify the items array while im iterating. @bryanmac yes it is n^2 and there might be some fancy other ways to do it, but this is simplest for me to think through also I doubt it will be at the scale where that would become a limitation. – Justin Meiners Sep 20 '11 at 21:46
  • I have seen plenty of code. I just meant I was looking for a cleaner way of doing it. – Nic Hubbard Sep 20 '11 at 22:10
  • agreed simple and works. just pointing out an alternative. I think the loop for the faster one is about the same code but might add a little more footprint (assuming duplicates are a minor case for yours). – bryanmac Sep 20 '11 at 22:29