3

I have used JSOUP for scraping and its works perfectly till the ajax and javascript not playing their roles to display webpage content .

Now guys any clue , how to scrape those content which get displayed with ajax or by JavaScript after page get loads completely .

Thanks in advance !!

Pankaj Wanjari
  • 1,275
  • 2
  • 9
  • 11

2 Answers2

3

You can use a headless browser as PhatomJS.

PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

In order to ease your work, You could use CapserJS

CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.

These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).

You can see a example about how casper works here:
CasperJs and Jquery with chained Selects

Community
  • 1
  • 1
Hemerson Varela
  • 24,034
  • 16
  • 68
  • 69
1

You can't do it directly with JSoup. You'll need a headless browser, which is a much more complex thing. There are headless versions of Firefox, Safari, and others. Searches for "headless X" (where X is the browser engine you want to use) should turn up some useful projects.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875