2

I am using HtmlUnit to spy a webpage, but it seems like it is unable to get the elements in the main content. I suspect it is because the page is rendered using Vue.js.

This is the page I am spying, I want to get the contents inside <div id="app"> webpage HTML

This is the output when I print the page using page.asXml(). The <div id="app"> is empty. HtmlUnit page.asXml()

This is the WebClient code I am using, I have enabled JavaScript.

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

WebClient webClient = new WebClient();
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.setJavaScriptErrorListener(new SilenceJavaScriptErrorListner());
webClient.setCssErrorHandler(new SilentCssErrorHandler());

This is the code inside a function where I wait for a certain element inside <div id="app"> to exist before returning. I have used method waitForBackgroundJavaScript() also.

HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
webClient.waitForBackgroundJavaScript(10000);

for (int i = 0; i < 10; i++) {
    page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
    webClient.waitForBackgroundJavaScript(10000);
    log.info("Current page \n" + page.asXml());
                
    List<Object> quoteNumberOptionList = page.getByXPath("someXPath");
                
    if (quoteNumberOptionList.size() > 0) {
        break;
    }
                
    Thread.sleep(5000);
}
yingxuan
  • 95
  • 3
  • 6
  • Did you try Selenium? I know it has a ability to render a website like a normal user. This problem you are facing is somewhat similar to how search engine robots can't parse Vue or Angular sites: there simply isn't anything to see unless you parse the JavaScript (which happens in a client). – Fullslack Nov 03 '20 at 10:13
  • yeah the problem is the existing code is all using HtmlUnit so I would need to redo the whole project if I change to Selenium. So wanted to find if there's any solutions first T.T – yingxuan Nov 03 '20 at 10:16
  • I guess you tried `webClient.setAjaxController(new NicelyResynchronizingAjaxController());` already? https://htmlunit.sourceforge.io/faq.html#AJAXDoesNotWork. **Edit:** Also check out this https://github.com/mpoehler/htmlunit-angular-test/blob/master/src/test/java/eu/tuxoo/integrationtest/AngularApp1Test.java, not sure why you use `getEnclosedPage()` and what it is doing compared to `webClient.getPage`. – Fullslack Nov 03 '20 at 10:33
  • Thanks for the suggestions. I have tried `webClient.setAjaxController(new NicelyResynchronizingAjaxController());` but it still doesn't work. I use `getEnclosedPage()` because this page was navigated to by clicking a link, not from a URL. But after you said, I have tried `webClient.getPage(url);` to get to that page directly but it still doesn't work. – yingxuan Nov 03 '20 at 10:52
  • Than it will not work until the open issues in HtmlUnit for Vue.JS are resolved. If it is possible I suggest switching to Selenium, or write a package to add Selenium to your current codebase. Using the Chrome driver seems to be the only working solution at this time. – Fullslack Nov 03 '20 at 11:15
  • Theres more likely than not another HTTP request being made to the back end you'd need to replicate to get the relevant content you're after. If you can share the URL I can take a look and provide some help. I'm usually able to pick apart a website fairly quickly – Rob Evans Nov 05 '20 at 14:36
  • @RobEvans thanks for your help but I am unable to share the URL because it is my company's internal website – yingxuan Nov 06 '20 at 02:00

1 Answers1

0

Since you mentioned in the comments above that you can't share the URL (and it likely isn't publicly accessible anyway) I've done a bit of a write up here that may help you Parsing web javascript content to string using android

Rob Evans
  • 2,822
  • 1
  • 9
  • 15