9

I've just been introduced to the wonders of MozRepl used in conjunction with Perl's WWW::Mechanize::Firefox, and was trying to figure out how to use it to crawl GWT pages (e.g.,: https://www.google.com/offers/home#!details/4bc7fd6bd3feb311/XYW81TXGLA88TR42)

What I really want is the rendered html, not the actual html. Would really appreciate an example of how I would get this.

Vijay Boyapati
  • 7,632
  • 7
  • 31
  • 48
  • Looks like I can render the parts of the page just by doing e.g.,: $mech->xpath('//*[@id="goh-content-container"]', one=>1)->{innerHTML}; Strangely tho, this doesn't seem to consistently work. Occasionally it will output nothing, and other times it will output the HTML. Any ideas on why it's not consistently providing output? – Vijay Boyapati Oct 23 '11 at 19:31
  • More info: when I run a single crawler it seems to output consistently, but if I have multiple interacting with MozRepl, the output seems to be produced less consistently. Running on Ubuntu 11.04 with Firefox 7.0.1 – Vijay Boyapati Oct 23 '11 at 19:34

1 Answers1

2

I decided to use the fantastic PhantomJS to get the job done. It's incredibly easy to use Phantom as a server side tool to get the rendered HTML of a dynamic webpage.

Vijay Boyapati
  • 7,632
  • 7
  • 31
  • 48