Background
With Swift, I'm trying to fetch HTML via URLSession
rather than by loading it into a WKWebView
first as I only need the HTML and none of the subresources. I'm running into a problem with certain pages that work when loaded into WKWebView
but when loaded via URLSession
(or even a simple NSString(contentsOf: url, encoding String.Encoding.utf8.rawValue)
) the UTF-8 conversion fails.
How to reproduce
This fails (prints "nil"):
print(try? NSString(contentsOf: URL(string: "http://www.huffingtonpost.jp/techcrunch-japan/amazon-is-gobbling-whole-foods-for-a-reported-13-7-billion_b_17171132.html?utm_hp_ref=japan&ir=Japan")!, encoding: String.Encoding.utf8.rawValue))
But changing the URL to the site's homepage, it succeeds:
print(try? NSString(contentsOf: URL(string: "http://www.huffingtonpost.jp")!, encoding: String.Encoding.utf8.rawValue))
Question
How can I "clean" the data returned by a URL that contains malformed UTF-8? I'd like to either remove or replace any invalid sequences in the malformed UTF-8 so that the rest of it can be viewed. WKWebView is able to render the page just fine (and claims it's UTF-8 content as well), as you can see by visiting the URL: http://www.huffingtonpost.jp/techcrunch-japan/amazon-is-gobbling-whole-foods-for-a-reported-13-7-billion_b_17171132.html?utm_hp_ref=japan&ir=Japan