Hi I want to scrap the information from a website so I tried to use Jsoup (also tried HttpClient) to do so. I realize that both of them couldn't "see" certain content of the html page. so when I tried to print out the parsed html, I got the empty div like this. It prints out some other div just fine.
here's my code:
Class Main{
public static void main(String args[]) throws IOException, InterruptedException {
Document doc = Jsoup.connect(url).get();
System.out.println(doc.getElementsByClass("needed content"));
}
}
the result in the terminal is:
<div class="needed content"></div>
I am searching for answers on stackoverflow, some recommends using Jackson Library Java - How do I access a child of Div using JSoup
some recommend embed a browser in java Is there a way to embed a browser in Java?
some recommend using htmlunit Fail to get full content of page with JSoup
I just tried combining Jsoup with html unit, same result here's the code:
try(WebClient wc = new WebClient()){
wc.getOptions().setJavaScriptEnabled(true);
wc.getOptions().setCssEnabled(false);
wc.getOptions().setThrowExceptionOnScriptError(false);
wc.getOptions().setTimeout(10000);
HtmlPage page = wc.getPage("https://chainlinklabs.com/jobs");
String pageXml = page.asXml();
Document doc2 = Jsoup.parse(pageXml, url);
System.out.println(doc2.getElementsByClass("needed content"));
System.out.println("Thank God!");
}
My interpretation of the problem is Jsoup is not showing part of the html content because it contains javascript; am I heading to the right direction?