Get HTML document as string before it has loaded

Question

I looked at this and this answer to that question, but they only get the HTML contents of the page up until the <script> that executes the code.

For example, in this snippet:

<!DOCTYPE html>
<html>

<head>
  <title>Test</title>

  <script type="text/javascript">
    console.log(new XMLSerializer().serializeToString(document));
  </script>

  <link type="text/css" rel="stylesheet" href="style.css">
</head>

<body>
  <script type="text/javascript" src="testscript1.js"></script>
  <script type="text/javascript" src="testscript2.js"></script>
  <script type="text/javascript" src="testscript3.js"></script>
</body>

</html>

if you take a look at the console.log() and scroll past the stackoverflow stuff, you'll see:

<script type="text/javascript">
    console.log(new XMLSerializer().serializeToString(document));
</script></body></html>

the <script> with src="testscript1.js" and the other two <script> tags are not present, I.E. the logged string does not contain all the HTML.

If you put the logging script at the bottom like this:

<!DOCTYPE html>
<html>

<head>
  <title>Test</title>

  <link type="text/css" rel="stylesheet" href="style.css">
</head>

<body>
  <script type="text/javascript" src="testscript1.js"></script>
  <script type="text/javascript" src="testscript2.js"></script>
  <script type="text/javascript" src="testscript3.js"></script>
  
  <script type="text/javascript">
    console.log(new XMLSerializer().serializeToString(document));
  </script>
</body>

</html>

it logs the other <script> tags.

Question

My guess is that since my scripts are loaded synchronously, the log outputs whatever has been loaded up to this point. How could I avoid that? I want my logging <script> to be as close to the top of the HTML as possible, while having access to all the HTML content.

What I've tried

If I put this script in the <head>:

var req = new XMLHttpRequest();
req.open("GET", document.location.href + "index.html", false);
req.onreadystatechange = function () {
    if (req.readyState === 4) {
        if (req.status === 200 || req.status == 0) {
            console.log(req.responseText);
        }
    }
}

req.send(null);

I get the desired result. But I don't like how easily it could fail. For example, if I paste this code as a snippet here in stackoverflow, it doesn't work because the requested file doesn't exist. If the document is named notindex.html, it would fail too.

Are there any alternatives or a reliable way to request the opened HTML document via an XMLHttpRequest?

Edit

I want to have access to all the HTML content before all stylesheets, scripts and images have loaded. That's the reason I want the logging script to be at the top. The XMLHttpRequest does it, but is unreliable.

score 2 · Answer 1 · answered May 07 '17 at 09:22

2

You can use the DOMContentLoaded event to run the function after your document has completely loaded:

<!DOCTYPE html>
<html>

<head>
  <title>Test</title>

  <script type="text/javascript">
    document.addEventListener("DOMContentLoaded", function() {
        console.log(new XMLSerializer().serializeToString(document));
    });
  </script>

  <link type="text/css" rel="stylesheet" href="style.css">
</head>

<body>
  <script type="text/javascript" src="testscript1.js"></script>
  <script type="text/javascript" src="testscript2.js"></script>
  <script type="text/javascript" src="testscript3.js"></script>
</body>

</html>

answered May 07 '17 at 09:22

Dekel

60,707
10
101
129

Excuse me, I didn't make it clear. I want to have access to all the HTML content before the whole page has _loaded_. That's the reason I want the script to be at the top. With your solution, it simply waits until the scripts have loaded. It's the same as putting the script at the bottom of the ``. I want to know of the existence of those three ` – dodov May 07 '17 at 09:28
1

If you don't want the browser to run the scripts you will have to use an external request to this page (`XMLHttpRequest` is an option, or you could use some other server-side language for that) – Dekel May 07 '17 at 09:33

Get HTML document as string before it has loaded

Question

What I've tried

Edit

1 Answers1

Linked