1

I'm wondering how I can split a string containing several sentences into an array of the sentences.

I know about the split function but spliting by "." doesn't suite for all cases.

Is there something like mentioned in this answer

Community
  • 1
  • 1
arnoapp
  • 2,416
  • 4
  • 38
  • 70
  • Take a look at the `enumerateTagsInRange:scheme:options:usingBlock:` method in the `NSLinguisticTagger` class to see if that suits your problem. – rdelmar Apr 27 '15 at 22:02

4 Answers4

5

If you are capable of using Apple's Foundation then solution could be quite straightforward.

import Foundation

var text = """
    Let's split some text into sentences.
    The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like ‍‍‍! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
    sentences.append(tag ?? "")
}

There are ways do it with pure Swift of course. Here is quick and dirty split:

let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like ‍‍‍! How do I split this?
"""

let sentencesPureSwift =  simpleText.split(omittingEmptySubsequences:true) {  $0.isPunctuation && !Set("',").contains($0)}

It could be refined with reduce().

Paul B
  • 3,989
  • 33
  • 46
4

You can use NSLinguisticsTagger to identify SentenceTerminator tokens and then split into an array of strings from there.

I used this code and it worked great.

https://stackoverflow.com/a/57985302/10736184

let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
    in: text.startIndex..<text.endIndex,
    scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
    tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
    $0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
    let r = prev...ix
    result.append(
        text[r].trimmingCharacters(
             in: NSCharacterSet.whitespaces))
     prev = text.index(after: ix)
}

Where result will now be an array of sentence strings. Note that the sentence will have to be terminated with '?', '!', '.', etc to count. If you want to split on newlines as well, or other Lexical Classes, you can add

|| $0.1 == "ParagraphBreak"

after

$0.1 == "SentenceTerminator"

to do that.

Rolf Locher
  • 163
  • 1
  • 13
0

Take a look on this link : How to create String split extension with regex in Swift?

it shows how to combine regex and componentsSeparatedByString.

Community
  • 1
  • 1
jregnauld
  • 1,288
  • 18
  • 17
-3

Try this:-

    var myString : NSString = “This is a test”
    var myWords: NSArray = myString.componentsSeparatedByString(“ “)
    //myWords is now: ["This", "is", "a", "test"]
  • Not a solution to the problem as it splits the string into words, not sentences. Please add a description to your answer so it is more concise. – Unterbelichtet Jul 20 '21 at 16:02