0

How can I get the unique characters in an NSString?

What I'm trying to do is get all the illegal characters in an NSString so that I can prompt the user which ones were inputted and therefore need to be removed. I start off by defining an NSCharacterSet of legal characters, separate them with every occurrence of a legal character, and join what's left (only illegal ones) into a new NSString. I'm now planning to get the unique characters of the new NSString (as an array, hopefully), but I couldn't find a reference anywhere.

NSCharacterSet *legalCharacterSet = [NSCharacterSet
    characterSetWithCharactersInString:@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLKMNOPQRSTUVWXYZ0123456789-()&+:;,'.# "];

NSString *illegalCharactersInTitle = [[self.titleTextField.text.noWhitespace
    componentsSeparatedByCharactersInSet:legalCharacterSet]
    componentsJoinedByString:@""];
Matthew Quiros
  • 13,385
  • 12
  • 87
  • 132
  • 1
    Why don't you apply a formatter to the text field so it's impossible to enter the illegal characters. This will provide a significantly more usable solution. – trojanfoe Nov 13 '13 at 10:14
  • 2
    The big bosses want me to display what illegal characters were displayed. I think it's really dumb, but I'm *just* a programmer. – Matthew Quiros Nov 13 '13 at 10:26
  • What about not separating them but using a `NSAttributedString` to highlight them in the original text? Just start with a new attributed mutable string, cycle through all characters and either append them or append them with a red color if they are illegal. – Sulthan Nov 13 '13 at 10:38

3 Answers3

2

That should help you. I couldn't find any ready to use function for that.

NSMutableSet *uniqueCharacters = [NSMutableSet set];
NSMutableString *uniqueString = [NSMutableString string];
[illegalCharactersInTitle enumerateSubstringsInRange:NSMakeRange(0, illegalCharactersInTitle.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
    if (![uniqueCharacters containsObject:substring]) {
        [uniqueCharacters addObject:substring];
        [uniqueString appendString:substring];
    }
}];
Grzegorz Krukowski
  • 18,081
  • 5
  • 50
  • 71
  • +1 for `-enumerateSubstringsInRange:...` with `NSStringEnumerationByComposedCharacterSequences` but see my answer for an additional caveat about the way `illegalCharactersInTitle` was computed. – Ken Thomases Nov 13 '13 at 10:39
2

Try with the following adaptation of your code:

// legal set
NSCharacterSet *legalCharacterSet = [NSCharacterSet
                                         characterSetWithCharactersInString:@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLKMNOPQRSTUVWXYZ0123456789-()&+:;,'.# "];

// test strings
NSString *myString = @"LegalStrin()";
//NSString *myString = @"francesco@gmail.com"; illegal string


NSMutableCharacterSet *stringSet = [NSCharacterSet characterSetWithCharactersInString:myString];
// inverts the set
NSCharacterSet *illegalCharacterSet = [legalCharacterSet invertedSet];

// intersection of the string set and the illegal set that modifies the mutable stringset itself
[stringSet formIntersectionWithCharacterSet:illegalCharacterSet];

// prints out the illegal characters with the convenience method
NSLog(@"IllegalStringSet: %@", [self stringForCharacterSet:stringSet]);

I adapted the method to print from another stackoverflow question:

- (NSString*)stringForCharacterSet:(NSCharacterSet*)characterSet
{
    NSMutableString *toReturn = [@"" mutableCopy];
    unichar unicharBuffer[20];
    int index = 0;

    for (unichar uc = 0; uc < (0xFFFF); uc ++)
    {
        if ([characterSet characterIsMember:uc])
        {
            unicharBuffer[index] = uc;

            index ++;

            if (index == 20)
            {
                NSString * characters = [NSString stringWithCharacters:unicharBuffer length:index];
                [toReturn appendString:characters];

                index = 0;
            }
        }
    }

    if (index != 0)
    {
        NSString * characters = [NSString stringWithCharacters:unicharBuffer length:index];
        [toReturn appendString:characters];
    }
    return toReturn;
}
Community
  • 1
  • 1
Fr4ncis
  • 1,387
  • 1
  • 11
  • 23
0

First of all, you have to be careful about what you consider characters. The API of NSString uses the word characters when talking about what Unicode refers to as UTF-16 code units, but dealing with code units in isolation will not give you what users think of as characters. For example, there are combining characters that compose with the previous character to produce a different glyph. Also, there are surrogate pairs, which only make sense when, um, paired.

As a result, you will actually need to collect substrings which contain what the user thinks of as characters.

I was about to write code very similar to Grzegorz Krukowski's answer. He beat me to it, so I won't but I will add that your code to filter out the legal characters is broken because of the reasons I cite above. For example, if the text contains "é" and it's decomposed as "e" plus a combining acute accent, your code will strip the "e", leaving a dangling combining acute accent. I believe your intent is to treat the "é" as illegal.

Ken Thomases
  • 88,520
  • 7
  • 116
  • 154