Unexpected characters in HTML string causing WKWebView issue

Question

I am grabbing a URL's content, extracting the html via

wkWebView.evaluateJavaScript("document.documentElement.outerHTML.toString()", completionHandler: { (html1: Any?, error: Error?)

then later firing up a WKWebview, with a base url to match the source of the URL above. With a nil baseURL, the web view displays properly but links are dead due to blank url (about:blank#! is the prefix to every link). Adding the correct base url results in almost correct link but with "... .com/#!/... " prefix. I can copy that incorrect url, edit out the '/#!' and it works.

The question: why is this getting added? I would guess i can override each request before it gets processed, intercept the link and update it, but have never done this, but would prefer to find out why the #! even gets added in the first place.

Thanks for any tips!

use `document.body.innerHTML` rather than `document.documentElement.outerHTML.toString()` — Pranav Kasetti, May 17 '18 at 13:06
I just tried that.. still no dice. Interestingly though, i long press on a tappable image (to facebook) fires up a modal preview that is requesting a login to facebook to see the content. The web view is definitely logged into fb, so perhaps this is a clue to the odd link? — drew.., May 17 '18 at 13:14
No errors printed from completionHandler? Also note this: [blank href attribute](https://stackoverflow.com/questions/25713069/why-is-wkwebview-not-opening-links-with-target-blank) — Pranav Kasetti, May 17 '18 at 14:48
no errors that i have uncovered as yet, just anytime there is a baseurl, the unusual /#! is tacked onto the end of the baseurl and preceding the rest of the link. Bizarre behaviours. I have tried a few variants of the baseurl to see if there was one that would work, no dice. My next attempt is to drop a branch, remove how the html is obtained (via loading in wkwebview) and instead using String(contentsOf: url) instead. — drew.., May 17 '18 at 16:58
and re the linked url, some interesting thoughts, but none seem to address this directly, but i will re-read. maybe facebook purposely obfuscates things to make this more challenging? — drew.., May 17 '18 at 17:05
The use of String(contentsOf: url) loses the userAgent and that kills that path. Unless i can find a way to strip the errant chars, i think i am not going to be able to cache the dictionary of html as i envisioned. That sucks as i could be hitting hundreds of pages and the repeated refreshes are a pain. — drew.., May 17 '18 at 21:04

Unexpected characters in HTML string causing WKWebView issue

0 Answers0