3

I have an array of strings from an online database that I trying to determine the most commonly used word. The values inside the arrays will vary but I want to check the most common words of whatever collection or words I'm using. If theoretically I had an array of the following...

NSArray *stringArray = [NSArray arrayWithObjects:@"Duck", @"Duck", @"Duck", @"Duck", @"Goose"];

How do I iterate through this array to determine the most common string, which would obviously be "Duck"?

nhgrif
  • 61,578
  • 25
  • 134
  • 173
Tom Testicool
  • 563
  • 7
  • 26
  • I would convert it to an NSDictionary (http://stackoverflow.com/questions/1414852/convert-nsarray-to-nsdictionary) and if the word (key) is already present, increase the value (count) by one, then just loop through to find the largest value. – James Black Jun 28 '14 at 02:40
  • The best way is probably to write your query in a way that would tell you... that'd probably be more efficient. – nhgrif Jun 28 '14 at 02:41
  • there used to be documented a @mode collection path operator, but the 2 problems with that is that, a) it isn't documented anymore, and b) I couldn't ever get it to work when it was documented. – Grady Player Jun 28 '14 at 14:32

3 Answers3

10

Simplest way is probably NSCountedSet:

NSCountedSet* stringSet = [[NSCountedSet alloc] initWithArray:strings];
NSString* mostCommon = nil;
NSUInteger highestCount = 0;

for(NSString* string in stringSet) {
    NSUInteger count = [stringSet countForObject:string];
    if(count > highestCount) {
        highestCount = count;
        mostCommon = string;
    }
}
eric.mitchell
  • 8,817
  • 12
  • 54
  • 92
  • 1
    This should probably be the accepted answer. Makes my answer look terrible. – nhgrif Jun 28 '14 at 03:06
  • Keep in mind this uses a specific kind of comparison. It will work for string literals where the strings are identical. But this does not inherently actually do word comparisons, rather object comparisons. Anything more complex will require a lot more logic to do real string analysis. This includes normalizing strings, handling cases sensitivity and othe language/script specific semantics and even tokenization of words which can be extremely complex depending on the language/script or mixed languages and scripts and the semantic definition of a word and a match. – uchuugaka Jun 28 '14 at 03:21
  • Good points; I just wanted to provide a simple example that answered the question that was asked. – eric.mitchell Jun 28 '14 at 03:35
  • Outstanding. Simply Outstanding. – Tom Testicool Jun 28 '14 at 05:33
  • @uchuugaka This isn't entirely true. `NSCountedSet` considers an object a duplicate based on the result of `isEqual:`. If it responds `YES` to an `isEqual:` call to any object already in the set, it is considered a duplicate. Now for `NSString`, this means a difference in case or something would mean it's not a duplicate... the two words contained different characters. But that's a silly point, as the answer to a question about how to determine if two strings are equal is "Use `isEqualToString:` (which for NSString is the same as `isEqual:` and no one argues these special cases here).(1/2) – nhgrif Jun 28 '14 at 12:25
  • (2/2) But if you REALLY need/want to take care of these special cases, you could either handle the array before turning it into a counted set and make them all the same case, strip accents, etc., or you could subclass `NSString` and override the `isEqual:` method so it will return `YES`/`NO` appropriately exactly as you want to compare them and get this solution to still work exactly as efficiently by simply using a subclass of `NSString` and not `NSString` itself. – nhgrif Jun 28 '14 at 12:27
  • Yes that's just the difference between object equality and literal string equality, but when an OP says compare words it should always always raise flags about human language semantics and defining the problem more clearly than the OP may be aware of. There are many ways to say whether two words are equal or not and a lot of context is needed. – uchuugaka Jun 28 '14 at 13:11
  • this is in BIG - O n^2 there must the better way... not that I can think of one off the top of my head... but if you can sort a set in better than n^2 time you should be able to count the elements faster than that... – Grady Player Jun 28 '14 at 14:36
  • @GradyPlayer I'm not particularly familiar with Big O notation or how you measure things in Big O... but I've just updated my answer, and think my answer now may be more efficient (the bottom part). – nhgrif Jun 28 '14 at 21:52
  • Those are hardly special cases when properly dealing with strings unless you actually already trust the source of the text and know it is limited to a finite set. Not considering linguistic semantics when you have anything potentially more than that is an incomplete solution when dealing with strings. User input in particular is unpredictable, language semantics are critical. It's a hard topic. Not slighting anybody, but clarifying the problem domain. – uchuugaka Jun 29 '14 at 15:27
  • This question might be a duplicate of this question: http://stackoverflow.com/questions/12226776/finding-most-repeated-object-in-array and the topic of efficiency is discussed in that question too. – eric.mitchell Jun 29 '14 at 23:18
3

You can use the word as a key into a dictionary.

NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
    if (!words[word]) {
        [words setValue:[NSDecimalNumber zero] forKey:word];
    }

    words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];
}

Now iterate through words and find the key with the highest value.

NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
for (NSString *key in [words allKeys]) {
    if ([words[key] compare:curMax] == NSOrderedDescending) {
        mostCommon = key;
        curMax = word[key];
    }
}

NSLog(@"Most Common Word: %@", mostCommon);

EDIT: Rather than looping through the array once then looping separately through the sorted dictionary, I think we can do better and do it all in a single loop.

NSString *mostCommon;
NSDecimalNumber *curMax = [NSDecimalNumber zero];
NSMutableDictionary *words = [NSMutableDictionary dictionary];
for (NSString *word in stringArray) {
    if (!words[word]) {
        [words setValue:[NSDecimalNumber zero] forKey:word];
    }

    words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]];

    if ([words[word] compare:curMax] == NSOrderedDescending) {
        mostCommon = word;
        curMax = words[word];
    }
}

NSLog(@"Most Common Word: %@", mostCommon);

This should be significantly faster than my answer pre-edit, though I don't know how it compares to using the NSCountedSet answer.

nhgrif
  • 61,578
  • 25
  • 134
  • 173
  • 1
    last line should be: words[word] = [words[word] decimalNumberByAdding:[NSDecimalNumber one]]; otherwise the count won't increment. This is a super common interview question. – Brandon Jun 28 '14 at 02:52
  • You're right. I always forget this returns the value rather than modifying the one you call it on. – nhgrif Jun 28 '14 at 02:53
  • thankfully swift makes that error less likely and this solution less verbose! – Brandon Jun 28 '14 at 02:54
1

Try using NSPredicate.

NSUInteger count=0;
NSString *mostCommonStr;
for(NSString *strValue in stringArray) {
   NSUInteger countStr=[[stringArray filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"self MATCHES[CD] %@, strValue]]count];
   if(countStr > count) {
       count=countStr;
       mostCommonStr=strValue;
   }
}
NSLog(@"The most commonstr is %@",mostCommonStr);
Hussain Shabbir
  • 14,801
  • 5
  • 40
  • 56
  • While this is less verbose than the solution I purposed, it's also definitely slower. `filteredArrayUsingPredicate:` is generally slower than using a `forin` loop to do the same thing, but you're doing this once per word in `stringArray` (which means 4 times for "Duck" and once for "Goose"). Time your solution with an array of 1000 words versus my solution with the same array. – nhgrif Jun 28 '14 at 12:19
  • 1
    For reference on `filteredArrayUsingPredicate:` versus a `forin` loop: http://stackoverflow.com/a/21158730/2792531 – nhgrif Jun 28 '14 at 12:29