I'm using a content script (loaded with run_at: document_start
to try and grab the exact source of the page before any DOM modifications take place from JavaScript.
I want the pure HTML - exactly what you'd get from Right Click > View Source
in the browser.
I've tried two methods which both nearly work but not quite.
Here's the actual raw source of the page, from Right Click > View Source
<!doctype html>
<html lang="en">
<head>
<title>Raw HTML title</title>
</head>
<body>
<p>Something here.</p>
<script>
document.title = 'Title injected by JS';
</script>
</body>
</html>
Things I've tried:
new XMLSerializer().serializeToString(document)
This produces the following:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="en"><head>
<title>Raw HTML title</title>
</head>
<body>
<p>Something here.</p>
<script>
document.title = 'Title injected by JS';
</script></body></html>
It's close, but for some reason the formatting isn't correct, 'doctype' is capitalised and the xmlns
attribute added to the <html>
tag.
document.documentElement.outerHTML
Produces the following:
<html lang="en"><head>
<title>Raw HTML title</title>
</head>
<body>
<p>Something here.</p>
<script>
document.title = 'Title injected by JS';
</script></body></html>
</body></html>
It's missing the doctype and the formatting is also not as per the original.