0

I want to use js to crawl a website, But the website use the ajax to paging the contents. At first, you can only crawl the first page content. the you must click a button(next page), the website use the ajax to update content of the webpage.

Qustion: I can use js to find the button, and click it. but how I know when the page has reloaded.and I can to get updated contents.

Thanks.

pexeer
  • 685
  • 5
  • 8
  • I think you can directly send request to the next page's ajax target and interact with the response. There must be a pattern for "next page" request url. – Chris Li May 31 '13 at 08:05
  • Good Idea.But the click event do much complex thing, so I wish I can simulate the click action. – pexeer May 31 '13 at 08:11

2 Answers2

0

I would use a "headless" browser for such a task:

PhantomJS

Casper

Especially the click function of casperJS could be used to what you intended.

Or the given exmple to fill out forms is dead simple:

casper.start('http://google.fr/', function() {
    // search for 'casperjs' from google form
    this.fill('form[action="/search"]', { q: 'casperjs' }, true);
});

From Quickstart of casperJs.

Thomas Junk
  • 5,588
  • 2
  • 30
  • 43
0

In this case, where the web context that you are trying to crawl includes dynamic content through AJAX, CasperJS is an excellent option if you want to use Javascript in order to achieve that. You can use it to trigger events, add process steps, include functions to wait and validate after each ajax call before to process any next step.

Here an example how crawl a website with CasperJS and JQuery
CasperJs and Jquery with chained Selects

Here an example how crawl a website with CasperJS and just Javascript
CasperJS dynamic selectlists

Community
  • 1
  • 1
Hemerson Varela
  • 24,034
  • 16
  • 68
  • 69