16

I have a string composed of words, some of which contain punctuation, which I would like to remove, but I have been unable to figure out how to do this.

For example if I have something like

var words = "Hello, this : is .. a  string?"

I would like to be able to create an array with

"[Hello, this, is, a, string]"

My original thought was to use something like words.stringByTrimmingCharactersInSet() to remove any characters I didn't want but that would only take characters off the ends.

I thought maybe I could iterate through the string with something in the vein of

for letter in words {
    if NSCharacterSet.punctuationCharacterSet.characterIsMember(letter){
        //remove that character from the string
    }
}

but I'm unsure how to remove the character from the string. I'm sure there are some problems with the way that if statement is set up, as well, but it shows my thought process.

Leo Dabus
  • 229,809
  • 59
  • 489
  • 571
qmlowery
  • 535
  • 2
  • 5
  • 7

6 Answers6

35

Xcode 11.4 • Swift 5.2 or later

extension StringProtocol {
    var words: [SubSequence] {
        split(whereSeparator: \.isLetter.negation)
    }
}

extension Bool {
    var negation: Bool { !self }
}

let sentence = "Hello, this : is .. a  string?"
let words = sentence.words  // ["Hello", "this", "is", "a", "string"]

 
Leo Dabus
  • 229,809
  • 59
  • 489
  • 571
  • I ended up using a mixture of a few of these solutions to get it to work, but this one helped me get the last error I had fixed. I ended up creating an NSMutableCharacterset which was alphanumericCharacterSet plus " ". Then I used the solution here to get what I needed without having extra spaces. – qmlowery Apr 16 '15 at 08:24
7

String has a enumerateSubstringsInRange() method. With the .ByWords option, it detects word boundaries and punctuation automatically:

Swift 3/4:

let string = "Hello, this : is .. a \"string\"!"
var words : [String] = []
string.enumerateSubstrings(in: string.startIndex..<string.endIndex,
                                  options: .byWords) {
                                    (substring, _, _, _) -> () in
                                    words.append(substring!)
}
print(words) // [Hello, this, is, a, string]

Swift 2:

let string = "Hello, this : is .. a \"string\"!"
var words : [String] = []
string.enumerateSubstringsInRange(string.characters.indices,
    options: .ByWords) {
        (substring, _, _, _) -> () in
        words.append(substring!)
}
print(words) // [Hello, this, is, a, string]
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • 2
    Thanks for enumerateSubstringsInRange & .ByWords. Very interesting. – Duyen-Hoa Apr 16 '15 at 07:52
  • This code needs to be updated for swift 4 or may be swift 3 – Inder Kumar Rathore Jul 26 '17 at 13:15
  • @InderKumarRathore: Updated for Swift 3. (It should work in Swift 4 as well, I'll check that later) – Thanks for the notice! – Martin R Jul 26 '17 at 13:21
  • @MartinR Lol! I was about to update your answer by just having information from your another answer here : https://stackoverflow.com/a/39534217/468724, but you updated it before me. Btw this works for swift 4. Thanks mate for the quick response :) – Inder Kumar Rathore Jul 26 '17 at 13:24
5

This works with Xcode 8.1, Swift 3:

First define general-purpose extension for filtering by CharacterSet:

extension String {
  func removingCharacters(inCharacterSet forbiddenCharacters:CharacterSet) -> String 
{
    var filteredString = self
    while true {
      if let forbiddenCharRange = filteredString.rangeOfCharacter(from: forbiddenCharacters)  {
        filteredString.removeSubrange(forbiddenCharRange)
      }
      else {
        break
      }
    }

    return filteredString
  }
}

Then filter using punctuation:

let s:String = "Hello, world!"
s.removingCharacters(inCharacterSet: CharacterSet.punctuationCharacters) // => "Hello world"
algal
  • 27,584
  • 13
  • 78
  • 80
0

NSScaner way:

let words = "Hello, this : is .. a  string?"

//
let scanner = NSScanner(string: words)
var wordArray:[String] = []
var word:NSString? = ""

while(!scanner.atEnd) {
  var sr = scanner.scanCharactersFromSet(NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKMNOPQRSTUVWXYZ"), intoString: &word)
  if !sr {
    scanner.scanLocation++
    continue
  }
  wordArray.append(String(word!))
}

println(wordArray)
nickcheng
  • 516
  • 4
  • 22
0
let charactersToRemove = NSCharacterSet.punctuationCharacterSet().invertedSet
let aWord = "".join(words.componentsSeparatedByCharactersInSet(charactersToRemove))
Savitha
  • 561
  • 6
  • 19
0

An alternate way to filter characters from a set and obtain an array of words is by using the array's filter and reduce methods. It's not as compact as other answers, but it shows how the same result can be obtained in a different way.

First define an array of the characters to remove:

let charactersToRemove = Set(Array(".:?,"))

next convert the input string into an array of characters:

let arrayOfChars = Array(words)

Now we can use reduce to build a string, obtained by appending the elements from arrayOfChars, but skipping all the ones included in charactersToRemove:

let filteredString = arrayOfChars.reduce("") {
    let str = String($1)
    return $0 + (charactersToRemove.contains($1) ? "" : str)
}

This produces a string without the punctuation characters (as defined in charactersToRemove).

The last 2 steps:

split the string into an array of words, using the blank character as separator:

let arrayOfWords = filteredString.componentsSeparatedByString(" ")

last, remove all empty elements:

let finalArrayOfWords = arrayOfWords.filter { $0.isEmpty == false }
Antonio
  • 71,651
  • 11
  • 148
  • 165