0

Question pretty much sums it up. I don't have access to a framework and I need to get the FULL HTML Source of the current page. Is there some command that is supported by JS natively?

For the record, I've tried:

document.getElementsByTagName('html')[0].innerHTML
Naftali
  • 144,921
  • 39
  • 244
  • 303
christopher
  • 26,815
  • 5
  • 55
  • 89

2 Answers2

4

This can be done in a one-liner using XMLSerializer.

var generatedSource = new XMLSerializer().serializeToString(document);

Which gives String

<!DOCTYPE html><html><head>

<title>javascript - Source code of HTML page without Framework - Stack Overflow</title>
...
Paul S.
  • 64,864
  • 9
  • 122
  • 138
  • This is awesome, but it seems to be cutting out some spaces and carriage returns that were there before? – christopher Jul 03 '13 at 14:45
  • @Chris this is a ** _generated_ ** source, not the _original_ source. The browser's parsing of the original source is where those spaces and new lines are removed (or even added). – Paul S. Jul 03 '13 at 14:46
  • And I'm guessing this is as close as I'm gonna get to the original source? – christopher Jul 03 '13 at 14:47
  • @Chris the only way to get closer I know of is to re-request the page using ajax. – Paul S. Jul 03 '13 at 14:48
  • Aye, which is what I've done to get the comparison page! The page changes an awful lot, dynamically, so the local copy was what I was going to use. Perhaps there's another away around this. I'll mark this as correct, because 99.9% of the time this is what someone will want. – christopher Jul 03 '13 at 14:50
  • @RichieHindle wrong, that is just Gecko's _serializeToStream_ method which never made it to [the spec](http://domparsing.spec.whatwg.org/#the-xmlserializer-interface). – Paul S. Jul 03 '13 at 14:51
  • @PaulS.: Sorry, you're quite right. Erroneous comment deleted. – RichieHindle Jul 03 '13 at 15:13
2

I don't think it's possible to get the complete, actual, live page source from the current page. (edit: as it seems, it is possible! See @Paul S. his answer)

But what you could do to load the initial, unaltered HTML, just load the page through AJAX, and then check the page source as it is returned by the server.

$.get(document.location.href, function(response) { 
    window.console.log(response);
});

This is with jQuery, but you would alternatively use

var xmlhttp;
if (window.XMLHttpRequest) {
    // code for IE7+, Firefox, Chrome, Opera, Safari
    xmlhttp=new XMLHttpRequest();
} else {
    // code for IE6, IE5
    xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.onreadystatechange=function() {
    if (xmlhttp.readyState==4 && xmlhttp.status==200) {
        window.console.log(xmlhttp.responseText);
    }
}
xmlhttp.open("GET",document.location.href,true);
xmlhttp.send();
Willem Mulder
  • 12,974
  • 3
  • 37
  • 62
  • Why would you use AJAX to get the source code of the page you're already on? – j08691 Jul 03 '13 at 14:31
  • 1
    Isn't it overly complicated to get the source of the page the Javascript is actually running on ? Good solution to get the source of another page though ! – Laurent S. Jul 03 '13 at 14:31
  • 3
    @j08691 Beause you'd get the original source code, not the current (possibly altered) source. – Wesley Murch Jul 03 '13 at 14:31
  • 1
    @j08691 and to get the DOCTYPE tags and the like, that you will miss if you just look at the tree under the HTML element. – Willem Mulder Jul 03 '13 at 14:35
  • 1
    The reason I'm doing this is to check for changes in the data, so I can reload the webpage automatically, when something in the backend changes. I am using AJAX, exactly in the way you've specified, to get the server's copy of the HTML. Now I need to get the local copy of the HTML to check if there is a difference. The logic being, if there IS a difference, something has changed in the back end and the page should refresh. – christopher Jul 03 '13 at 14:36
  • OP said the only tags he wasn't getting that he cared about were `` and ``. – j08691 Jul 03 '13 at 14:37
  • @Chris I don't see how changes on a front-end page imply changes at the back-end, but if that is the best way for you to do it, why not... – Willem Mulder Jul 03 '13 at 14:39
  • Because the Page itself is built by an XSL document, that is generated by XML. The XML changes -> the XSL changes -> The HTML changes. – christopher Jul 03 '13 at 14:44
  • @Chris Not too sure about your situation, but why not just do something like set a `lastUpdated` timestamp on the server side, poll for it, and refresh if the time is greater than the current time. In any case, you're better off just asking how to solve your actual problem, the solution you chose seems kind of hacky. – Wesley Murch Jul 03 '13 at 14:46
  • I'm aware this is something of an X-Y situation, but if I started three days ago and I have absolutely no idea how half of this stuff works, so I'm trying to solve it in a way that I am sure will work. – christopher Jul 03 '13 at 14:48