Does anyone know a way to print the loaded page HTML after DOM is completed?

Question

I'm looking for a way to read the source code of a page after it finished loading and inspect the code to see if it contains a specific text.

I found this reference but this only returns the text visible in the page and not the whole HTML code.

For instance, if the html source code is:

<html>
<header>
<header>
<body>
<p> This is a paragraph</a>
<body>
</html>

I want the script to print exactly the same thing.

Your help is appreciated.

What are you looking for in the source code exactly? Why do you want to "inspect the code" versus using jQuery to traverse the DOM? — gen_Eric, Jan 22 '16 at 20:49
You could take the innerHTML property of the tag, like it is proposed in your link. — ssc-hrep3, Jan 22 '16 at 20:50
You can get the page markup using document.documentElement.innerHTML (Source:http://stackoverflow.com/questions/817218/how-to-get-the-entire-document-html-as-a-string) — Shashank Karam, Jan 22 '16 at 20:50
@ShashankReddyKaram good link but based on OP's reference, it seems like he wants the markup from an XMLHttpRequest rather than from the current document. — Patrick Roberts, Jan 22 '16 at 20:52
Possible duplicate of [How to print HTML content on click of a button, but not the page?](http://stackoverflow.com/questions/16894683/how-to-print-html-content-on-click-of-a-button-but-not-the-page) — Asons, Jan 22 '16 at 20:54
Sorry for the confusion about the word "print". What I want to achieve is the same result that the "right click > inspect element" would give. What I'm trying to do is: 1) Open URL 2) Wait for the page to load 3) Check if page contains an iframe 4) Display a message if the iframe is found — Leo S., Jan 22 '16 at 21:07
@LeoS.: Why not just do something like `document.getElementsByTagName('iframe')` (or `$('iframe')`)? To do this after the page loads, you can use `window.addEventListener('load', function() {})` (or `$(function(){})`). — gen_Eric, Jan 22 '16 at 21:09
Possible duplicate of [Best Way to View Generated Source of Webpage?](https://stackoverflow.com/questions/1750865/best-way-to-view-generated-source-of-webpage) — Asons, Mar 11 '18 at 10:12

score 0 · Answer 1 · edited May 23 '17 at 10:27

Do like this, call this function on load

Fiddle Demo

function printBody() {
  // store oiginal content
  var originalContents = document.body.innerHTML;

  // get the outer html of the document element
  document.body.innerText = document.documentElement.outerHTML;

  // call window.print if you want it on paper
  window.print();

  // or put it into an iframe
  // var ifr = document.createElement('iframe');
  // ifr.src = 'data:text/plain;charset=utf-8,' + encodeURI(document.documentElement.outerHTML);
  // document.body.appendChild(iframe);

  // a small delay is needed so window.print does not get the original
  setTimeout(function(){
    document.body.innerHTML = originalContents;
  }, 2000);
}

Src: Print <div id=printarea></div> only?

Rogier Spieker · Answer 2 · 2016-01-23T12:03:01.467

Assuming that by 'print' you don't actually mean to transfer it to a paper copy, you can add some script like:

window.addEventListener('load', function() {
    var content = document.documentElement.innerHTML,
        pre = document.createElement('pre'),
        body = document.body;

    pre.innerText = content;

    body.insertBefore(pre, body.firstChild);
});

What this does, step by step is:

window.addEventListener('load', function() > Wait for the page to be fully loaded and then execute the function
content = document.documentElement.innerHTML > store the actual page source in the content variable (document.documentElement refers to the 'root'-node, usually <html> in html documents
pre = document.createElement('pre') > create a new <pre>-element
body = document.body > create a reference to the <body> element
pre.innerText = content > assign the HTML-structure we've stored earlier as text to the <pre>-element
body.insertBefore(pre, body.firstChild) > put the <pre>-element (now with contents) before any other element in the body (usually on top of the page).

This leaves you with the entire source (as it was before creating the <pre>-element containing the source) on top of you page.

Edit: Added <iframe> workflow It was not clear to me you actually wanted to target an <iframe>, so here's how to do that (using a naive approach, more on that further on):

window.addEventListener('load', function() {
    var iframeList = document.getElementsByTagName('iframe'),
        body = document.body,
        content, pre, i;

    for (i = 0; i < iframeList.length; ++i) {
        content = iframeList[i].documentElement.innerHTML;
        pre = document.createElement('pre');

        pre.innerText = content;
        body.insertBefore(pre, body.firstChild);
    }
});

why is this approach naive?

There is a thing called Same-Origin-Policy in javascript, which prevents you from accessing <iframe>-content which if the contents do not originate from the same domain as the page containing the <iframe>.

There are several ways to take this into consideration, you could wrap the inside of the for-loop in try/catch-blocks, though I prefer to use a more subtle approach by not even considering <iframes> which do not match the Same-Origin-Policy.

In order to do this, you can swap the getElementsByTagName method with the querySelectorAll method (please note the compatibility table at the bottom of that page, see if it matches your requirements). The querySelectorAll accepts a valid CSS selector and will return a NodeList containing all matching elements.

A simple selector to use would be 'iframe[src]:not([src^="//"]):not(src^="http")' which selects all iframe with a src attribute which does not start with either // or http

Disclaimer: I never use a <base>-tag (which changes all relative paths within the HTML) or refer to the current website using a path containing the domain, so the example CSS-selector does not consider these aberrations.

Can you use :not()

IE9 or better

Can you use document.querySelector(All)

IE8 or better (in order to use with :not(), IE9 or better)

hover/click the boxes above to show the spoiler

Hi Rogier, your script works well. I made a small mod to search for the iframe as suggested by @Rocket Hazmat However, how do I search for another element inside that iframe now? Currently, my code looks like this: ` window.addEventListener('load', function() { var content = document.getElementsByTagName('iframe').contentDocument.documentElement.innerHTML, pre = document.createElement('pre'), body = document.body; pre.innerText = content; body.insertBefore(pre, body.firstChild); }); ` — Leo S., Jan 22 '16 at 21:52
You're on the right track, there is just a small mistake: `document.getElementsByTagName('iframe')` will return a `NodeList` containing zero or more elements, it does not return a single element. — Rogier Spieker, Jan 23 '16 at 11:18

score 0 · Answer 3 · answered Jan 22 '16 at 21:13

I think you are over-complicating this problem. You don't need to "print" the page's HTML or "inspect the code".

In a comment, you said:

Check if page contains an iframe [and] Display a message if the iframe is found

You can just use DOM traversal functions to examine the DOM.

Try something like this:

window.addEventListener('load', function() {
    if(document.getElementsByTagName('iframe').length){
        console.log('Found an iframe');
    }
});

Or with jQuery:

$(function() {
    if($('iframe').length){
        console.log('Found an iframe');
    }
});

score 0 · Answer 4 · edited May 23 '17 at 11:44

0

That's so simple, you can use this method to run a script after a page is fully loaded window.onload

function load(){
    console.log(document.getElementsByTagName('html')[0].innerHTML);
}
window.onload = load;

For further explanations, check this post

edited May 23 '17 at 11:44

Community

1
1

answered Jan 22 '16 at 21:35

Aymen Ben Tanfous

98
2
11

Does anyone know a way to print the loaded page HTML after DOM is completed?

4 Answers4