5

In a string in objective-c if I say something like [myStr length]; it returns a number which is the number of bytes but what can I use to return the number of characters in a string.

For example:
a string with the letter "a" in it returns a length of 1
a string with a single emoji in it returns a length of 2 (or even a length of 4 sometimes)


this is because that is the number of bytes in the string... I just need the number of characters.

valentinas
  • 4,277
  • 1
  • 20
  • 27
Albert Renshaw
  • 17,282
  • 18
  • 107
  • 195
  • @valentinas please re-wread my question. I don't think you grasped it. Thankyou for the link though! But that's not what I'm looking for. – Albert Renshaw Mar 10 '13 at 22:51
  • 1
    «which is the number of bytes» No, as the documentation states, it's the number of Unicode characters. – jscs Mar 10 '13 at 22:52
  • @JoshCaswell The documentation is wrong. – Albert Renshaw Mar 10 '13 at 22:53
  • 1
    By "character" do you mean a [Unicode scalar value](http://www.unicode.org/glossary/#unicode_scalar_value) or code-point? – Mike Samuel Mar 10 '13 at 22:53
  • @MikeSamuel I'm not sure I know the difference but just that any unicode character from #0000 to #E01E0f to be counted as "1" – Albert Renshaw Mar 10 '13 at 22:55
  • I suspect the problem is in how you created the strings. You need to supply the data AND the encoding. If you do that, then [str length] should work properly. – DrC Mar 10 '13 at 22:55
  • @DrC This is all I've done. the string referenced by the `%@` is just a UITextView the user types in. Where should I add the encoding to account for emoji unicode characters? `userMessageCount = [[NSString stringWithFormat:@"%@", userMessageView.text] length];` – Albert Renshaw Mar 10 '13 at 22:57
  • currently the only options I see is to either use a massive for loop to convert everything from bytes to character-count and keep track of it for each new thing typed / deleted. Or to send the whole string into a UIWebView using `loadHTML` and then send some javascript in there to return the *true* string length. <--neither would be fun. – Albert Renshaw Mar 10 '13 at 23:00
  • 3
    No, it's not. You seem to be looking for the number of _glyphs_ in the rendering, the things that the user sees. Each glyph can be represented by several Unicode characters. See also: ["Characters and Grapheme Clusters"](http://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/Strings/Articles/stringsClusters.html#//apple_ref/doc/uid/TP40008025-SW1). – jscs Mar 10 '13 at 23:02
  • @JoshCaswell Okay, that makes more sense +1 So how do I get the number of glyphs? I noticed the flag emojis are made up of two unicode characters (but I'm assuming that is an exception to finding glyph length because even Twitter considers the 1 flag unicode characters as 2 characters haha!) – Albert Renshaw Mar 10 '13 at 23:04
  • I'm trying to figure that out. Actually, glyphs might be the wrong direction -- that's what you get when you render the string, but I'm not certain there's only one glyph when rendering, e.g., `é`. – jscs Mar 10 '13 at 23:10
  • @JoshCaswell Hm. Interesting. I know doing it with javascript and an invisible UIWebView will work, but I have a feeling it will be very slow. I wonder if there's an easy way to see how Twitter's app does it. ? – Albert Renshaw Mar 10 '13 at 23:14
  • @AlbertRenshaw, I didn't mean to suggest that they should be treated as two exclusive options. "character" often means octet, UTF-16 code-unit, Unicode scalar value. When input is messy, sometimes it means code-point instead of scalar value. – Mike Samuel Mar 10 '13 at 23:47

1 Answers1

9

I just whipped up this method. Add it to an NSString category.

- (NSUInteger)characterCount {
    NSUInteger cnt = 0;
    NSUInteger index = 0;
    while (index < self.length) {
        NSRange range = [self rangeOfComposedCharacterSequenceAtIndex:index];
        cnt++;
        index += range.length;
    }

    return cnt;
}

NSString *a = @"Hello";
NSLog(@"%@ length = %u, chars = %u", a, a.length, a.characterCount);
NSString *b = @" Emoji ";
NSLog(@"%@ length = %u, chars = %u", b, b.length, b.characterCount);

This yields:

Hello length = 5, chars = 5
Emoji length = 11, chars = 9

rmaddy
  • 314,917
  • 42
  • 532
  • 579
  • Incredible! rmaddy, you always impress me! Haha! `rangeOfComposedCharacterSequenceAtIndex` VERY COOL! – Albert Renshaw Mar 10 '13 at 23:51
  • 3
    (interesting note, try this with the flag emojis and it won't work. but that's because the flag emojis aren't actually unicode characters, they are two unicode characters that iOS and OSX render as 1 character when placed side by side.... ... For example the US flag is "*" without the asterisk, try copy and pasting "*" into a text area (like a comment on SO) and then backspacing the asterisk and see what happens :o) – Albert Renshaw Mar 11 '13 at 00:01
  • Very interesting. Those flags are actually 4 Unicode symbols according to the Special Characters viewer on the Mac. My answer reports a single flag as having a character count of 2 while an `NSString length` of 4. I'll see If I can find a solution. – rmaddy Mar 11 '13 at 00:07
  • Interesting, lucky for me since I'm building a twitter add-on app I am okay with having the flags count as 2 because even twitter counts them as 2 :) – Albert Renshaw Mar 11 '13 at 00:09
  • My only problem right now is with the custom backspace on my keyboard, if I backspace by using a `substringToIndex:` I get MASSIVE glitches when backspacing emojis then trying to type again and ask emojis take 2 backspace to fully delete (flags take 4) – Albert Renshaw Mar 11 '13 at 00:10
  • 1
    Dealing with the backspace should work if you make use of the `rangeOfComposedCharacterSequenceAtIndex:` method to get the length of the character. Obviously, there will still be an issue with these flag characters. – rmaddy Mar 11 '13 at 00:15
  • Very nice! I had a suspicion that one of the `rangeOfComposedCharacters...` methods might be a key to a solution, but I got lost in typesetting documentation instead. – jscs Mar 11 '13 at 00:20
  • 2
    This is cool. The US flag symbol is composed of the boxed U and the boxed S characters. The German flag symbol is from the boxed D and boxed E characters. Clever. – rmaddy Mar 11 '13 at 00:26
  • 1
    I posted a question on the [Apple iOS dev forums](https://devforums.apple.com/message/792458#792458). – rmaddy Mar 11 '13 at 00:47