5

I am trying to access some content on a web page that is created by some Javascript. However, the content that I wish to access is created by the javascript after the page has loaded so this chunk of Html source is no where to be found when I try and parse it with Jsoup.

My code for getting the Html source, using HtmlUnit is as follows:

public static void main(String[] args) throws IOException {
           java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);

    String url = "myUrl.com";
    out.println("accessing " + url);

    HtmlPage page = webClient.getPage(url);

    out.println("waiting for js");
    webClient.waitForBackgroundJavaScriptStartingBefore(200);
    webClient.waitForBackgroundJavaScript(20000);

    out.println(page.asXml());

    webClient.close();
}

But when I run it, the Html that is supposed to be created is not printed. I was wondering how do I get this Html source, created by the Javascript, using HtmlUnit and then getting said result and passing it to Jsoup for parsing?

THow
  • 119
  • 9
  • It looks Ok to me. What happens when you don't set it to run JavaScript, ie. output the page without running JavaScript ? – Jonas Czech Feb 24 '16 at 08:05
  • 1
    Also, seeing your previous question: A much better, faster, and easier way to do this might be to load the page in your desktop browser, look at the network tab of the developer tools, and if it loads additional data from another URL (probably as JSON or so), you will see where the actual data is coming from and you can use that same URL in your code and process the JSON or other data it gives you. Just a thought.. (Also: JavaScript support in HtmlUnit isn't great, you may find that some things don't work right) – Jonas Czech Feb 24 '16 at 09:03
  • Thank you for your response. I have already considered that. The problem is that the page in question formats the data into a **way** easier format to pull from than the source of the data (relayed by JSON) so I would really prefer just to get the Html table created by the Javascript. – THow Feb 24 '16 at 17:50
  • Also, when I don't set it to run the Javascript the same output is given/printed: The base Html without the table that is supposed to be created – THow Feb 24 '16 at 17:53
  • Hmm, it seems that HtmlUnit isn't running JavaScript right, or not doing the Ajax. I'm not very familiar with HtmlUnit specifically, but perhaps you can set it to report and print JS errors ? Otherwise, there's probably not much you can do, and HtmlUnit's AJAX support is not great. You could try Selenium instead, which actually uses a real browser, instead, which should work reliable, if that's an option in your case. – Jonas Czech Feb 24 '16 at 19:05
  • Please provide the URL you are using to give us a chance to reproduce your problem. And please do not disable logging, usually the log contains hints about the root of your problem. And make sure you are using the latest version or even better the latest SNAPSHOT build. – RBRi Nov 14 '17 at 14:28
  • See [this related post](https://stackoverflow.com/q/50189638/8583692). – Mahozad Nov 15 '21 at 13:27

1 Answers1

1

Jsoup is server side processing framework,
I am not sure what is your final goal, I assume you want to use it in the same page so I will go with Ajax so you can do:

  • On document ready, capture the document dom
  • Send it for processing on server side
  • Display the results on the same page

Something like:

.

$( document ).ready(function() {
    var allClientSideHtml = $("html").html();

var dataToSend = JSON.stringify({'htmlSendToSever':allClientSideHtml });
 $.ajax({ url: "your_Jsoup_server_url.jsp_or_php/YourJsoupParser",
        type: "POST",
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        data: dataToSend , // pass that text to the server as a JSON String
        success: function (msg) { alert(msg.d); },
        error: function (type) { alert("ERROR!!" + type.responseText); }

    });

});
JavaSheriff
  • 7,074
  • 20
  • 89
  • 159