8

I'm trying to pull some webpage source code from a WebView in an Android app. I've managed, using this: http://lexandera.com/2009/01/extracting-html-from-a-webview/

plus this to make it work after KitKat:

 if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.KITKAT) {
        webView.evaluateJavascript(
                "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
                new ValueCallback<String>() {
                    @Override
                    public void onReceiveValue(String html) {
                        outputViewer.setText(html);
                    }
                });
    }else{
        webView.loadUrl("javascript:window.HTMLOUT.showHTML" +
                "('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
    }

Now, the problem is that the non-kitkat version returns exactly what I want. The KitKat version however returns an escaped version of the code, something like this:

"\u003Chtml>\u003Chead>\n\t\u003Cmeta charset=\"UTF-8\">\n\t\u003Cmeta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">\n\t\u003Clink rel=\"profile\" href=\"http://gmpg.org/xfn/11\">\n\t\u003Clink rel=\"pingback\" 

Is there a straight forward way to unescape that string on Android?

Mike

MikeCoverUps
  • 723
  • 3
  • 10
  • 19
  • 1
    I _did_ google it, and came across that answer, but neither of those ranked solutions worked. The JSON.parse doesn't seem to work, just returns null, and the solution in the javascript still passes x as escaped. I think the solution need to be in java after the string has been passed out, rather than in the Javascript, as it seems to be being escaped on the way out. Does that make any sense? – MikeCoverUps Jan 08 '16 at 12:44
  • Well, the solution is programmed in JavaScript, so you need to insert ` – PDKnight Jan 08 '16 at 16:05

1 Answers1

4

I had the same problem and it looks like it's java-escaped so since I'm already using apache commons lang this worked for me:

str = StringEscapeUtils.unescapeJava(str);

before

"\u003Chtml lang=\"en\">\u003Chead> \u003Cmeta content=\"width=device-width,minimum-scale=1.0\"...

after

"<html lang="en"><head> <meta content="width=device-width,minimum-scale=1.0"...

I took the code from:

Convert escaped Unicode character back to actual character

Community
  • 1
  • 1
carrizo
  • 689
  • 6
  • 15