92

On Android, I have a WebView that is displaying a page.

How do I get the page source without requesting the page again?

It seems WebView should have some kind of getPageSource() method that returns a string, but alas it does not.

If I enable JavaScript, what is the appropriate JavaScript to put in this call to get the contents?

webview.loadUrl("javascript:(function() { " +  
    "document.getElementsByTagName('body')[0].style.color = 'red'; " +  
    "})()");  
Malte Schwerhoff
  • 12,684
  • 4
  • 41
  • 71
gregm
  • 12,019
  • 7
  • 56
  • 78
  • use jquery script and js interface to get html content from webview window.interface.processHTML($(\"body\").html()); – Dev.Sinto Jun 21 '13 at 12:17
  • http://stackoverflow.com/questions/8200945/how-to-get-html-content-from-a-webview?rq=1 – trante Dec 02 '13 at 08:46
  • You can obviously get the response in HTML using the HTTP Requests, but if some page requires post data to be loaded(like for example user credentials etc), this approach simply fails. I think this is how it should be because if you could do it, you can probably make your own android app for any website and that would suck! –  Jun 24 '14 at 11:57

7 Answers7

170

I think I found the answer in this post on lexandera.com. The code below is basically a cut-and-paste from the site. It seems to do the trick.

final Context myApp = this;

/* An instance of this class will be registered as a JavaScript interface */
class MyJavaScriptInterface
{
    @JavascriptInterface
    @SuppressWarnings("unused")
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}

final WebView browser = (WebView)findViewById(R.id.browser);
/* JavaScript must be enabled if you want it to work, obviously */
browser.getSettings().setJavaScriptEnabled(true);

/* Register a new JavaScript interface called HTMLOUT */
browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");

/* WebViewClient must be set BEFORE calling loadUrl! */
browser.setWebViewClient(new WebViewClient() {
    @Override
    public void onPageFinished(WebView view, String url)
    {
        /* This call inject JavaScript into the page which just finished loading. */
        browser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }
});

/* load a web page */
browser.loadUrl("http://lexandera.com/files/jsexamples/gethtml.html");
Dharman
  • 30,962
  • 25
  • 85
  • 135
jluckyiv
  • 3,691
  • 3
  • 22
  • 15
  • 6
    Beware that this might not be the raw HTML of the page; the page content may have changed dynamically through JavaScript before `onPageFinished()` was executed. – Paul Lammertsma Dec 13 '11 at 18:43
  • 3
    It's great, but calling the method `browser.loadUrl` in `onPageFinished` will cause `onPageFinished` to be called again. You might want to check whether it is the first call of `onPageFinished` or not before calling `browser.loadUrl`. – Yi H. Dec 30 '12 at 08:57
  • Thanks @Blundell It worked to me. I'd like to know how could this be **implemented as a service** . Since is a service without a layout and webview to store the results. Is there a way to put the data in some other object different from the webView so we can put the javascript to get the resulting html code? – Totalys Feb 12 '14 at 03:19
  • @Totalys that's even easier `String html = new Scanner(new DefaultHttpClient().execute(new HttpGet("www.the url")).getEntity().getContent(), "UTF-8").useDelimiter("\\A").next();` (abbreviated to fit in a comment :-) ) – Blundell Feb 12 '14 at 09:31
  • Thanks @Blundell , but I put this line and then: `MyJavaScriptInterface mji = new MyJavaScriptInterface(); mji.showHTML(html);` and comment everything related to the webview (since I don't have this component) but this way, I just get the html without the javascript processed... Is anything missing? – Totalys Feb 13 '14 at 02:59
  • Ah my bad, i thought you just wanted html. Sorry above is only js solution. Can't you just do `new WebView(context);` – Blundell Feb 13 '14 at 07:14
  • for security reason we should use `@SuppressLint("SetJavaScriptEnabled")` – Choletski Oct 21 '15 at 09:59
  • 1
    Don't forget to insert runOnUiThread(new Runnable() { ... into public void processHTML. – CoolMind Apr 19 '16 at 16:43
  • This fires before React Native part of webpage has loaded. – c0dehunter Oct 13 '17 at 12:04
  • i need use html in UI; but if i use this part: " process the html as needed by the app", my app will crashing. how can i change may ui with processHTML method? @jluckyiv – radin May 03 '18 at 04:54
  • @RohollahSaberi, I'm sorry I didn't respond earlier. The part that says `process the html as needed by the app` is a comment. You should substitute your app's code there. I don't know that I can help you with that. – jluckyiv May 19 '18 at 13:50
  • @jluckyiv Tnx, I'm using a Runnable Class for Fix this. – radin May 28 '18 at 06:26
  • I don't understand. It doesn't work. How to get into the processHTML() method? – Dyno Cris Jan 18 '22 at 09:52
  • @YiH. You are correct. The very common case of redirects needs to be well thought and handled. – WebViewer Dec 25 '22 at 13:35
35

Per issue 12987, Blundell's answer crashes (at least on my 2.3 VM). Instead, I intercept a call to console.log with a special prefix:

// intercept calls to console.log
web.setWebChromeClient(new WebChromeClient() {
    public boolean onConsoleMessage(ConsoleMessage cmsg)
    {
        // check secret prefix
        if (cmsg.message().startsWith("MAGIC"))
        {
            String msg = cmsg.message().substring(5); // strip off prefix

            /* process HTML */

            return true;
        }

        return false;
    }
});

// inject the JavaScript on page load
web.setWebViewClient(new WebViewClient() {
    public void onPageFinished(WebView view, String address)
    {
        // have the page spill its guts, with a secret prefix
        view.loadUrl("javascript:console.log('MAGIC'+document.getElementsByTagName('html')[0].innerHTML);");
    }
});

web.loadUrl("http://www.google.com");
durka42
  • 1,502
  • 10
  • 17
18

This is an answer based on jluckyiv's, but I think it is better and simpler to change Javascript as follows.

browser.loadUrl("javascript:HTMLOUT.processHTML(document.documentElement.outerHTML);");
Community
  • 1
  • 1
nagoya0
  • 2,768
  • 2
  • 23
  • 28
6

Have you considered fetching the HTML separately, and then loading it into a webview?

String fetchContent(WebView view, String url) throws IOException {
    HttpClient httpClient = new DefaultHttpClient();
    HttpGet get = new HttpGet(url);
    HttpResponse response = httpClient.execute(get);
    StatusLine statusLine = response.getStatusLine();
    int statusCode = statusLine.getStatusCode();
    HttpEntity entity = response.getEntity();
    String html = EntityUtils.toString(entity); // assume html for simplicity
    view.loadDataWithBaseURL(url, html, "text/html", "utf-8", url); // todo: get mime, charset from entity
    if (statusCode != 200) {
        // handle fail
    }
    return html;
}
larham1
  • 11,736
  • 5
  • 35
  • 26
4

I managed to get this working using the code from @jluckyiv's answer but I had to add in @JavascriptInterface annotation to the processHTML method in the MyJavaScriptInterface.

class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    @JavascriptInterface
    public void processHTML(String html)
    {
        // process the html as needed by the app
    }
}
dr_sulli
  • 893
  • 10
  • 21
1

You also need to annotate the method with @JavascriptInterface if your targetSdkVersion is >= 17 - because there is new security requirements in SDK 17, i.e. all javascript methods must be annotated with @JavascriptInterface. Otherwise you will see error like: Uncaught TypeError: Object [object Object] has no method 'processHTML' at null:1

javauser71
  • 4,979
  • 10
  • 27
  • 29
-1

If you are working on kitkat and above, you can use the chrome remote debugging tools to find all the requests and responses going in and out of your webview and also the the html source code of the page viewed.

https://developer.chrome.com/devtools/docs/remote-debugging

onusopus
  • 1,234
  • 11
  • 16