5

So here is the string s:

"Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."

I want them to be separated to a array as:

["Hi", "How are you", "I'm fine", "It is 6 p.m", "Thank you", "That's it"]

Which means the separators should be ". " + "? " + "! "

I've tried:

let charSet = NSCharacterSet(charactersInString: ".?!")
let array = s.componentsSeparatedByCharactersInSet(charSet)

But it will separate p.m. to two elements too. Result:

["Hi", " How are you", " I'm fine", " It is 6 p", "m", " Thank you", " That's it"]

I've also tried

let array = s.componentsSeparatedByString(". ")

It works well for separating ". " but if I also want to separate "? ", "! ", it become messy.

So any way I can do it? Thanks!

He Yifei 何一非
  • 2,592
  • 4
  • 38
  • 69

5 Answers5

6

There is a method provided that lets you enumerate a string. You can do so by words or sentences or other options. No need for regular expressions.

let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
var sentences = [String]()
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex, options: .BySentences) { 
    substring, substringRange, enclosingRange, stop in
    sentences.append(substring!)
}
print(sentences)

The result is:

["Hi! ", "How are you? ", "I\'m fine. ", "It is 6 p.m. ", "Thank you! ", "That\'s it."]

rmaddy
  • 314,917
  • 42
  • 532
  • 579
3

rmaddy's answer is correct (+1). A Swift 3 implementation is:

var sentences = [String]()

string.enumerateSubstrings(in: string.startIndex ..< string.endIndex, options: .bySentences) { substring, substringRange, enclosingRange, stop in
    sentences.append(substring!)
}

You can also use regular expression, NSRegularExpression, though it's much hairier than rmaddy's .bySentences solution. In Swift 3:

var sentences = [String]()

let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))")
regex.enumerateMatches(in: string, range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
    sentences.append((string as NSString).substring(with: match!.rangeAt(2)))
}

Or Swift 2:

let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))", options: [])
var sentences = [String]()
regex.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
    sentences.append((string as NSString).substringWithRange(match!.rangeAtIndex(2)))
}

The [.!?] syntax matches any of those three characters. The | means "or". The ^ matches the start of the string. The $ matches the end of the string. The \\s matches a whitespace character. The \\w matches a "word" character. The * matches zero or more of the preceding character. The + matches one or more of the preceding character. The (?=) is a look-ahead assertion (e.g. see if there's something there, but don't advance through that match).

I've tried to simplify this a bit, and it's still pretty complicated. Regular expressions offer rich text pattern matching, but, admittedly, it is a little dense when you first use it. But this rendition matches (a) repeated punctuation (e.g. "Thank you!!!"), (b) leading spaces, and (c) trailing spaces, too.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
2

If the splitting basis is something a little more esoteric than sentences, this extension could work.

extension String {
    public func components(separatedBy separators: [String]) -> [String] {
        var output: [String] = [self]
        for separator in separators {
            output = output.flatMap { $0.components(separatedBy: separator) }
        }
        return output.map { $0.trimmingCharacters(in: .whitespaces)}
    }
}

let artists = "Rihanna, featuring Calvin Harris".components(separated by: [", with", ", featuring"])
Eric_WVGG
  • 2,957
  • 3
  • 28
  • 29
0

I tried to find a regex to solve this too: (([^.!?]+\s)*\S+(\.|!|\?)) Here the explanation from regexper and an example

mt81
  • 3,288
  • 1
  • 26
  • 35
0

Well I've found a regex too from here

var pattern = "(?<=[.?!;…])\\s+(?=[\\p{Lu}\\p{N}])"

let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."

let sReplaced = s.stringByReplacingOccurrencesOfString(pattern, withString:"[*-SENTENCE-*]" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)

let array = sReplaced.componentsSeparatedByString("[*-SENTENCE-*]")

Perhaps it's not a good way as it has to first replace and than separate the string. :)

UPDATE:

For regex part, if you also want to match Chinese/Japanese punctuations (which space after each punctuation is not necessary), you can use the following one:

((?<=[.?!;…])\\s+|(?<=[。!?;…])\\s*)(?=[\\p{L}\\p{N}])
Community
  • 1
  • 1
He Yifei 何一非
  • 2,592
  • 4
  • 38
  • 69