5

I'm trying to use a Swift 3 CharacterSet to filter characters out of a String but I'm getting stuck very early on. A CharacterSet has a method called contains

func contains(_ member: UnicodeScalar) -> Bool
Test for membership of a particular UnicodeScalar in the CharacterSet.

But testing this doesn't produce the expected behaviour.

let characterSet = CharacterSet.capitalizedLetters

let capitalAString = "A"

if let capitalA = capitalAString.unicodeScalars.first {
    print("Capital A is \(characterSet.contains(capitalA) ? "" : "not ")in the group of capital letters")
} else {
    print("Couldn't get the first element of capitalAString's unicode scalars")
}

I'm getting Capital A is not in the group of capital letters yet I'd expect the opposite.

Many thanks.

Josh Paradroid
  • 1,172
  • 18
  • 45
  • 2
    Martin just beat me to it, but that character set does not contain the capital A you're looking for: `["Dž", "Lj", "Nj", "Dz", "ᾈ", "ᾉ", "ᾊ", "ᾋ", "ᾌ", "ᾍ", "ᾎ", "ᾏ", "ᾘ", "ᾙ", "ᾚ", "ᾛ", "ᾜ", "ᾝ", "ᾞ", "ᾟ", "ᾨ", "ᾩ", "ᾪ", "ᾫ", "ᾬ", "ᾭ", "ᾮ", "ᾯ", "ᾼ", "ῌ", "ῼ"]` – JAL Feb 15 '17 at 14:53
  • 2
    While Martin's answer was first and is correct, I think you've discovered the root of the problem — I don't even know what a capital letter is! – Josh Paradroid Feb 15 '17 at 15:19

1 Answers1

9

CharacterSet.capitalizedLetters returns a character set containing the characters in Unicode General Category Lt aka "Letter, titlecase". That are "Ligatures containing uppercase followed by lowercase letters (e.g., Dž, Lj, Nj, and Dz)" (compare Wikipedia: Unicode character property or Unicode® Standard Annex #44 – Table 12. General_Category Values).

You can find a list here: Unicode Characters in the 'Letter, Titlecase' Category.

You can also use the code from NSArray from NSCharacterset to dump the contents of the character set:

extension CharacterSet {
    func allCharacters() -> [Character] {
        var result: [Character] = []
        for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
            for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
                if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
                    result.append(Character(uniChar))
                }
            }
        }
        return result
    }
}

let characterSet = CharacterSet.capitalizedLetters
print(characterSet.allCharacters())

// ["Dž", "Lj", "Nj", "Dz", "ᾈ", "ᾉ", "ᾊ", "ᾋ", "ᾌ", "ᾍ", "ᾎ", "ᾏ", "ᾘ", "ᾙ", "ᾚ", "ᾛ", "ᾜ", "ᾝ", "ᾞ", "ᾟ", "ᾨ", "ᾩ", "ᾪ", "ᾫ", "ᾬ", "ᾭ", "ᾮ", "ᾯ", "ᾼ", "ῌ", "ῼ"]

What you probably want is CharacterSet.uppercaseLetters which

Returns a character set containing the characters in Unicode General Category Lu and Lt.

Community
  • 1
  • 1
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382