There is no off-the shelf way to limit the number of words in a string.
If you look at this post, it documents using the method enumerateSubstrings(in: Range)
and setting an option of .byWords. It looks like it returns an array of Range
values.
You could use that to create an extension on String that would return the first X words of that string:
extension String {
func firstXWords(_ wordCount: Int) -> Substring {
var ranges: [Range<String.Index>] = []
self.enumerateSubstrings(in: self.startIndex..., options: .byWords) { _, range, _, _ in
ranges.append(range)
}
if ranges.count > wordCount - 1 {
return self[self.startIndex..<ranges[wordCount - 1].upperBound]
} else {
return self[self.startIndex..<self.endIndex]
}
}
}
If we then run the code:
let sentence = "I want to an algorithm that could help find out how many words are there in a string separated by space or comma or some character. And then append each word separated by a character to an array which could be added up later I'm making an average calculator so I want the total count of data and then add up all the words. By words I mean the numbers separated by a character, preferably space Thanks in advance"
print(sentence.firstXWords(10))
The output is:
I want to an algorithm that could help find out
Using enumerateSubstrings(in: Range)
is going to give much better results than splitting your string using spaces, since there are a lot more separators than just spaces in normal text (newlines, commas, colons, em spaces, etc.) It will also work for languages like Japanese and Chinese that often don't have spaces between words.
You might be able to rewrite the function to terminate the enumeration of the string as soon as it reaches the desired number of words. If you want a small percentage of the words in a very long string that would make it significantly faster (the code above should have O(n)
performance, although I haven't dug deeply enough to be sure of that. I also couldn't figure out how to terminate the enumerateSubstrings()
function early, although I didn't try that hard.)
Leo Dabus provided an improved version of my function. It extends StringProtocol rather than String, which means it can work on substrings. Plus, it stops once it hits your desired word count, so it will be much faster for finding the first few words of very long strings:
extension StringProtocol {
func firstXWords(_ n: Int) -> SubSequence {
var endIndex = self.endIndex
var words = 0
enumerateSubstrings(in: startIndex..., options: .byWords) { _, range, _, stop in
words += 1
if words == n {
stop = true
endIndex = range.upperBound
}
}
return self[..<endIndex] }
}