1

I am trying to fetch content from following page with JSOUP:

http://www.etronics.com/appliances/cooking.html#!/limit=all

I'm requesting the page with Jsoup as follow:

Jsoup.connect(url).userAgent(USER_AGENT).timeout(timeoutInMs).data("limit","all").get().outerHtml();

Where

USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 Safari/537.36";

I expect to get a page containing 990 products but I only get 384. What I would like is to have the page content as in a browser.

As explained here It may be caused by JSoup not executing javascript but I'm nor sure this is the cause of my problem or at least I don't know how to check that.

How can I obtain every elements visible via page view source ?

Community
  • 1
  • 1
Joel Costigliola
  • 6,308
  • 27
  • 35

1 Answers1

0

try loading your webpage inside htmlunit, which does execute javascript (you can read about how to do that in their docs) - it allows you to access the webpage DOM.

you ould also just fire up the developer tools in your browser when viewing the page and see how many http requests it makes and where - if its loading extra products in separate requests there's definitely some scripting involved.

radai
  • 23,949
  • 10
  • 71
  • 115
  • Thanks radai, HtmlUnit was an option I had in mind, I have tried and succeeded using [HttpComponents](https://hc.apache.org/) although I had to programmatically retry downloading the page (it seems that the website responsiveness is erratic). – Joel Costigliola Mar 11 '14 at 16:41