19

I would like to crawl a website, the problem is, that its full of JavaScript things, such as buttons and such that when they are pressed, they do not change the URL, but the data on the page is changed.

Usually I use LWP / Mechanize etc to crawl sites, but neither support JavaScript. any idea?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
snoofkin
  • 8,725
  • 14
  • 49
  • 86

4 Answers4

8

Another option might be Selenium with WWW::Selenium module

erickb
  • 6,193
  • 4
  • 24
  • 19
7

The WWW::Scripter module has a JavaScript plugin that may be useful. Can't say I've used it myself, however.

ishnid
  • 539
  • 2
  • 5
5

WWW::Mechanize::Firefox might be of use. that way you can have Firefox handle the complex JavaScript issues and then extract the resultant html.

Eric Strom
  • 39,821
  • 2
  • 80
  • 152
1

I would suggest HtmlUnit and Perl wrapper: WWW::HtmlUnit.

Minh Le
  • 1,145
  • 1
  • 12
  • 20