Execute HTML Javascript from Command Line

Question

I have a number of web pages that I am attempting to parse information from obtained using curl. Each of the page uses JQuery to transform its content upon the document being loaded in the browser (using the document.ready function) - mostly setting the classes/ids of divs. The information is much easier to parse once the Javascript functions have been loaded.

What are my options for (preferably from the command line) executing the Javascript content of the pages and dumping the transformed HTML?

http://getfirebug.com/commandline ?? is this what you are looking for man. — Tats_innit, May 20 '12 at 08:41
+1 sounds interesting :) I thought about node.js for a while but that won't work for you =/ — Ja͢ck, May 20 '12 at 08:44

score 2 · Accepted Answer · edited May 23 '17 at 11:56

2

To scrape dynamic web, don't use static download tools like curl.

If you want to scrape dynamic web use a headless webbrowser which you can control from your programming language. The most popular tool for this is Selenium

http://code.google.com/p/selenium/

With Selenium you can export modified DOM tree out of the browser as HTML.

An example use case:

https://stackoverflow.com/a/10053589/315168

edited May 23 '17 at 11:56

Community

1
1

answered May 20 '12 at 09:35

Mikko Ohtamaa

82,057
50
264
435

Thanks Mikko, I ended up using Selenium with the Java & Chrome bindings to load each page and subsequently dump the page source - it worked a treat! – mmccomb May 20 '12 at 12:59

Execute HTML Javascript from Command Line

1 Answers1