First of all, your code is incorrect. characterAtIndex
returns unichar
, so you should use @"%C"
(uppercase) as the format specifier.
Even with the correct format specifier, your code is unsafe, and strictly speaking, still incorrect, because not all unicode characters can be represented by a single unichar
. You should always handle unicode strings per substring:
It's common to think of a string as a sequence of characters, but when
working with NSString objects, or with Unicode strings in general, in
most cases it is better to deal with substrings rather than with
individual characters. The reason for this is that what the user
perceives as a character in text may in many cases be represented by
multiple characters in the string.
You should definitely read String Programming Guide.
Finally, the correct code for you:
NSString *danishString = @"æøå";
NSMutableArray *characters = [[NSMutableArray alloc] initWithCapacity:[danishString length]];
[danishString enumerateSubstringsInRange:NSMakeRange(0, danishString.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
[characters addObject:substring];
}];
If with NSLog(@"%@", characters);
you see "strange character" of the form "\Uxxxx", that's correct. It's the default stringification behavior of NSArray
by description
method. You can print these unicode characters one by one if you want to see the "normal characters":
for (NSString *c in characters) {
NSLog(@"%@", c);
}