20

I'm trying to filter non-alphabetical characters out of a String, but running into the issue that CharacterSet uses Unicode.Scalar and String consists of Character.

Xcode gives the error:

Cannot convert value of type 'String.Element' (aka 'Character') to specified type 'Unicode.Scalar?'

let name = "name"
let allowedCharacters = CharacterSet.alphanumerics
let filteredName = name.filter { (c) -> Bool in
    if let s: Unicode.Scalar = c { // cannot convert
        return !allowedCharacters.contains(s)
    }
    return true
}
pkamb
  • 33,281
  • 23
  • 160
  • 191
Peter Lapisu
  • 19,915
  • 16
  • 123
  • 179

3 Answers3

32

CharacterSet has an unfortunate name inherited from Objective C. In reality, it is a set of Unicode.Scalars, not of Characters (“extended grapheme clusters” in Unicode parlance). This is necessary, because while there is a finite set of Unicode scalars, there is an infinite number of possible grapheme clusters. For example, e + ◌̄ + ◌̄ + ◌̄ ... ad infinitum is still just one cluster. As such, it is impossible to exhaustively list all possible clusters, and it is often impossible to list the subset of them that has a particular property. Set operations such as those in the question must use scalars instead (or at least use definitions derived from the component scalars).

In Swift, Strings have a unicodeScalars property for operating on the string a the scalar level, and the property is directly mutable. That enables you to do things like this:

// Assuming...
var name: String = "..."

// ...then...
name.unicodeScalars.removeAll(where: { !CharacterSet.alphanumerics.contains($0) })
12

A single Character can consist of several UnicodeScalars, so you need to iterate through all of them and check if they are contained in CharacterSet.alphanumerics.

let allowedCharacters = CharacterSet.alphanumerics
let filteredName = name.filter { (c) -> Bool in
    return !c.unicodeScalars.contains(where: { !allowedCharacters.contains($0)})
}

Test input: let name = "asd1"

Test output: "asd1"

Dávid Pásztor
  • 51,403
  • 9
  • 85
  • 116
  • 4
    @PeterLapisu in the modern world of Strings not just consisting of ASCII characters, handling `Unicode` characters correctly is a necessary evil – Dávid Pásztor Dec 05 '18 at 11:58
  • 4
    but it is ridiculous that a CharacterSet doesn't work with Character – Peter Lapisu Dec 05 '18 at 12:24
  • @PeterLapisu that's because a `Character` can consist of several `UnicodeScalar`s (in case of complex emojis for instance). You could easily make this method an extension on `CharacterSet`. – Dávid Pásztor Dec 05 '18 at 12:26
2

Without var or double-negatives:

let filteredName = String(name.unicodeScalars.filter {
    CharacterSet.alphanumerics.contains($0)
})
Chad Parker
  • 382
  • 3
  • 6