1

I am using Swift. I am try to convert a sentence to a string array. I have used map to separate fullstops and commas from the word as follows:

extension String  {


func convertSentenceToArray()-> [String] {
var sentence = String(self)

sentence.index(of: ".").map { 
   sentence.remove( at: $0)
   sentence.insert(".", at: $0)
   sentence.insert(" ", at: $0)
   }
sentence.index(of: ",").map { 
  sentence.remove( at: $0)
  sentence.insert(",", at: $0)
  sentence.insert(" ", at: $0) 
   }
 return sentence.components(separatedBy: " ")
 }
}

let  thisSentenceString = "I am trying to create an array from a sentence. But I don't understand, Why isn't the last fullstop removed, from the last word."

let thisSentenceArray = thisSentenceString.convertSentenceToArray()

print(thisSentenceArray)

results in:

["I", "am", "trying", "to", "create", "an", "array", "from", "a", "sentence", ".", "But", "I", "don\'t", "understand", ",", "Why", "isn\'t", "the", "last", "fullstop", "removed,", "from", "the", "last", "word."]

All the fullstops and commas are handled as I would expect except for the last.

I don't understand why the last full stop remains. While I can find a work around for this, I would like to understand what is wrong with the approach I have taken.

Rob
  • 415,655
  • 72
  • 787
  • 1,044
sapjjr
  • 147
  • 1
  • 6

4 Answers4

2

First an explanation what your code does:

sentence
   .index(of: ".") // find the first index of the dot character
   .map {  // Optional.map, if the index exists, do the following
      sentence.remove( at: $0) // remove dot
      sentence.insert(".", at: $0) // insert dot again
      sentence.insert(" ", at: $0) // insert space
   }

or rewritten:

if let firstDotIndex = sentence.index(of: ".") {
    sentence.insert(" ", at: firstDotIndex)
}

That means only the first dot character is found and replaced.

To do this algorithm correctly, you would need:

// helper method checking punctuation to avoid code duplication
let isPunctuation: (Character) -> Bool = {
    return [".", ","].contains($0)
}

// initial range, we want to check the entire string
var range = sentence.startIndex...

// iterate while some punctuation exists
while let punctuationIndex = sentence[range].index(where: isPunctuation) {
    // insert the separator
    sentence.insert(" ", at: punctuationIndex)
    // search next punctuation only from the last replacement
    range = sentence.index(after: punctuationIndex)...
}

However, there is actually already a method for String replacement:

sentence = sentence.replacingOccurrences(of: ".", with: " .")

Or even simpler, with a regular expression to cover all punctuation characters in one go:

return self
    .replacingOccurrences(of: "[,.]", with: " $0", options: .regularExpression)
    .components(separatedBy: " ")
Sulthan
  • 128,090
  • 22
  • 218
  • 270
2

This is slightly different than what you asked for, but depending upon why you’re doing this, you can consider the NaturalLanguage framework. E.g.

import NaturalLanguage

let text = "I am trying to create an array from a sentence. But I don't understand, Why isn't the last fullstop removed, from the last word."

var words: [String] = []

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitWhitespace, .joinContractions]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, range in
    if let tag = tag {
        words.append(String(text[range]))
    }
    return true
}
print(words)

["I", "am", "trying", "to", "create", "an", "array", "from", "a", "sentence", ".", "But", "I", "don\'t", "understand", ",", "Why", "isn\'t", "the", "last", "fullstop", "removed", ",", "from", "the", "last", "word", "."]

What’s interesting about this is that the tag property will tell you the parts of speech, what’s a sentence terminator, etc., e.g.:

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, range in
    if let tag = tag {
        print(text[range], tag.rawValue)
    }
    return true
}

Producing:

I Pronoun
am Verb
trying Verb
to Particle
create Verb
an Determiner
array Noun
from Preposition
a Determiner
sentence Noun
. SentenceTerminator
But Conjunction
I Pronoun
don't Verb
understand Verb
, Punctuation
Why Pronoun
isn't Verb
the Determiner
last Adjective
fullstop Noun
removed Verb
, Punctuation
from Preposition
the Determiner
last Adjective
word Noun
. SentenceTerminator

Or, perhaps you don’t really care about the punctuation and simply want to have this broken up into sentences and the sentences broken up into words:

var sentences: [[String]] = []

let sentenceTokenizer = NLTokenizer(unit: .sentence)
sentenceTokenizer.string = text

sentenceTokenizer.enumerateTokens(in: text.startIndex ..< text.endIndex) { range, _ in
    let sentence = String(text[range])
    let wordTokenizer = NLTokenizer(unit: .word)
    wordTokenizer.string = sentence

    let words = wordTokenizer.tokens(for: sentence.startIndex ..< sentence.endIndex)
        .map { String(sentence[$0]) }

    sentences.append(words)
    return true
}
print(sentences)

[
  ["I", "am", "trying", "to", "create", "an", "array", "from", "a", “sentence"],
  ["But", "I", "don\'t", "understand", "Why", "isn\'t", "the", "last", "fullstop", "removed", "from", "the", "last", “word"]
]

There are lots of options here between NLTagger and NLTokenizer. Depending upon what problem you’re really trying to solve, these might be better than manipulating strings yourself.


As Sultan said, you can obviously just insert spaces and the split the string, though I might suggest adding other punctuation symbols and include + to match more or one characters in the case of consecutive punctuation marks (notably ellipses, ...), e.g.

let words = text.replacingOccurrences(of: "[,.:;!?]+", with: " $0", options: .regularExpression)
    .split(separator: " ")
Rob
  • 415,655
  • 72
  • 787
  • 1,044
  • 1
    Rob - that is absolutely amazing I can see I am going to have to study the Natural Language framework!! Thanks again – sapjjr Feb 01 '19 at 09:03
0

Maybe you want this way:

 func convertSentenceToArray()-> [String] {
    var sentence = String(self)
 sentence =    sentence.replacingOccurrences(of: ".", with: " .")
 sentence =    sentence.replacingOccurrences(of: ",", with: " ,")
    return sentence.components(separatedBy: " ")
}
E.Coms
  • 11,065
  • 2
  • 23
  • 35
  • Good solution but still a bit too complicated `self.replacingOccurrences(of: "([,.])", with: " $1", options: .regularExpression)`. Using `var sentence` copy is now not necessary. – Sulthan Jan 31 '19 at 20:16
  • Many thanks E.Coms solutions works for me, I think I must have been trying too hard to use .map!!! Thanks again to everyone who replied – sapjjr Jan 31 '19 at 20:54
  • @sapjjr Thank you. As you teach me how to use optional.map to make code a little neat as here: https://stackoverflow.com/questions/54441006/create-multidimensional-array-based-on-value-of-key/54445417#54445417 – E.Coms Jan 31 '19 at 20:57
0

Here is a more traditional and general approach:

func separateString(string: String) -> [String]{
    let stringArray = Array(string.unicodeScalars)
    var stringsArray: [String] = []

    let letterSet = CharacterSet.letters
    let punctuationSet = CharacterSet.punctuationCharacters

    var newWord = ""
    var newPunctioationChar = ""

    for char in stringArray {
        if letterSet.contains(char) {
            newWord.unicodeScalars.append(char)

        } else if punctuationSet.contains(char) {
            newPunctioationChar.unicodeScalars.append(char)

            stringsArray.append(contentsOf: [newWord, newPunctioationChar])

            newWord = ""
            newPunctioationChar = ""
        }
    }

     return stringsArray
}
infinite369
  • 65
  • 1
  • 11