Getting rendered HTML with MozRepl and Mechanize::Firefox

Question

I've just been introduced to the wonders of MozRepl used in conjunction with Perl's WWW::Mechanize::Firefox, and was trying to figure out how to use it to crawl GWT pages (e.g.,: https://www.google.com/offers/home#!details/4bc7fd6bd3feb311/XYW81TXGLA88TR42)

What I really want is the rendered html, not the actual html. Would really appreciate an example of how I would get this.

Looks like I can render the parts of the page just by doing e.g.,: $mech->xpath('//*[@id="goh-content-container"]', one=>1)->{innerHTML}; Strangely tho, this doesn't seem to consistently work. Occasionally it will output nothing, and other times it will output the HTML. Any ideas on why it's not consistently providing output? — Vijay Boyapati, Oct 23 '11 at 19:31
More info: when I run a single crawler it seems to output consistently, but if I have multiple interacting with MozRepl, the output seems to be produced less consistently. Running on Ubuntu 11.04 with Firefox 7.0.1 — Vijay Boyapati, Oct 23 '11 at 19:34

score 2 · Accepted Answer · answered Oct 10 '12 at 21:36

2

I decided to use the fantastic PhantomJS to get the job done. It's incredibly easy to use Phantom as a server side tool to get the rendered HTML of a dynamic webpage.

answered Oct 10 '12 at 21:36

Vijay Boyapati

7,632
7
31
48

Getting rendered HTML with MozRepl and Mechanize::Firefox

1 Answers1