2

As the title states, I am wondering if there is a method to obtain the generated HTML code of a page. Obviously I can inspect the page with web developer tools (browser built-in, or external program) and get it, but I would really like to do it automatically. Perhaps using Fiddler's API it could be possible?

Thanks!

Anders
  • 12,088
  • 34
  • 98
  • 146
  • 1
    Do you mean getting the source of the page from JavaScript on the page itself, or getting the source of a target URL using a script running on your computer (and if so are you using a particular language)? – Jeremy Mar 09 '13 at 20:41
  • Or, are you searching for Fiddler's API specifically? – John Dvorak Mar 09 '13 at 20:44
  • 1
    Would http://phantomjs.org/ be of interest? – HellaMad Mar 09 '13 at 20:49
  • @JeremyBanksᐛ Basically, I want to be able to capture the source of the web page AFTER any dynamic content has been generated and loaded, be it via AJAX, simple JavaScript "replace X on load with Y", or any other method. – Anders Mar 09 '13 at 21:19
  • @DC_ I'll play around with this and see if this will suit my needs. My program will be written in .NET (c#, more specifically), so maybe I can implement [something like this](http://stackoverflow.com/questions/172753/embedding-javascript-engine-into-net-c) – Anders Mar 09 '13 at 21:22

2 Answers2

2

"Source" doesn't get altered by JavaScript after page load, it's the document object model (DOM) generated from the source that gets altered. It is this DOM that is then translated to the GUI, and is altered with every change as long as the page is not re-loaded.

The DOM is not a string of HTML code, it is an in-memory hierarchical object representation of the page. The browser does not maintain an up-to-date, flat-file representation of the DOM as it gets altered, which is why when you "view source" you only ever see what was originally sent to the browser over HTTP.

The node-for-node representation of the page/DOM, in developer tools such as Firebug is the closest you'll get to a re-generation of the source code (AFAIK) without building some new tool yourself.

Faust
  • 15,130
  • 9
  • 54
  • 111
0

You may be able to write a script in Python that would take a variable (the URL) and insert it after a command that would download the webpage, such as wget.

Googling it, I have found this to parse HTML files: maybe you could wget the index.HTML and use one of these: How do you parse and process HTML/XML in PHP?

Community
  • 1
  • 1
Noah
  • 95
  • 1
  • 14