I'm creating a reading list app, and I'd like to pass the read time of a user added link to a table cell in their reading list - and the only way to get that number is from that page's word count. I've found a few solutions, namely Parsehub, Parse and Mercury but they seem to be geared more towards use cases that need more advanced things to be scraped from a url. Is there a simpler way in Swift to calculate word count of a url?
Asked
Active
Viewed 423 times
1 Answers
0
First of all, you need to parse the HTML. HTML can only be parsed reliably with dedicated HTML parser. Please don't use Regular Expressions or any other search method to parse HTML. You may read it why from this link. If you are using swift, you may try Fuzi or Kanna. After you get the body text with any one of the library, you have to remove extra white spaces and count the words. I have written some basic code with Fuzi library for you to get started.
import Fuzi
// Trim
func trim(src:String) -> String {
return src.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
}
// Remove Extra double spaces and new lines
func clean(src:String) ->String {
return src.replacingOccurrences(
of: "\\s+",
with: " ",
options: .regularExpression)
}
let htmlUrl = URL(fileURLWithPath: ((#file as NSString).deletingLastPathComponent as NSString).appendingPathComponent("test.html"))
do {
let data = try Data(contentsOf: htmlUrl)
let document = try HTMLDocument(data: data)
// get body of text
if let body = document.xpath("//body").first?.stringValue {
let cleanBody = clean(src: body)
let trimmedBody = trim(src:cleanBody)
print(trimmedBody.components(separatedBy: " ").count)
}
} catch {
print(error)
}
If you are fancy, you may change my global functions to String
extension or you can combine them in a single function. I wrote it for clarity.

Community
- 1
- 1

Meanteacher
- 2,031
- 3
- 17
- 48