36

Is there a way to access the page HTML source code using javascript?

I know that I can use document.body.innerHTML but it contains only the code inside the body. I want to get all the page source code including head and body tags with their content, and, if it's possible, also the html tag and the doctype. Is it possible?

gunr2171
  • 16,104
  • 25
  • 61
  • 88
mck89
  • 18,918
  • 16
  • 89
  • 106
  • 1
    Possible duplicate of [How to get the entire document HTML as a string?](https://stackoverflow.com/questions/817218/how-to-get-the-entire-document-html-as-a-string) – wesinat0r Oct 14 '19 at 00:24

5 Answers5

48

Use

document.documentElement.outerHTML

or

document.documentElement.innerHTML
gunr2171
  • 16,104
  • 25
  • 61
  • 88
Eldar Djafarov
  • 23,327
  • 2
  • 33
  • 27
  • i don't know why in Firefox the document.documentElement object doesn't have the outerHTML property, but with the innerHTML i can get almost everything except the doctype so thank you! – mck89 Sep 02 '09 at 13:14
  • 8
    @mck89: no browser but IE will have `outerHTML`. – Crescent Fresh Sep 02 '09 at 13:21
  • 6
    Be aware that the source you get with Firefox/most browsers is the "true" source you served up. In IE you will get the "live" HTML of the page including any changes the user has made to forms, any new DOM content etc. In IE it will also be the mixed case invalid tag soup that IE provides when requesting the .innerHTML of elements. – scunliffe Sep 02 '09 at 13:35
  • 2
    In case anyone else is still looking into this, the situation has changed somewhat. @Crescent Fresh was correct 2 years ago, however more recent versions of Chrome and Safari also implement HTMLELement.outerHTML - though at the time of writing, Firefox does not. – Liam Newmarch Aug 19 '11 at 10:32
  • 3
    @LiamNewmarch 2 years after your comment, which was 2 years after the initial post, and it seems that now Firefox also implements outerHTML. :) – Kip Aug 12 '13 at 14:50
  • 12
    This is the current state of the DOM not the source code. – Lothar May 10 '15 at 08:37
19

This can be done in a one-liner using XMLSerializer.

var generatedSource = new XMLSerializer().serializeToString(document);

Which gives String

<!DOCTYPE html><html><head>

<title>html - javascript page source code - Stack Overflow</title>
...
Paul S.
  • 64,864
  • 9
  • 122
  • 138
  • Unfortunately you will get garbage if the document content has any character that requires escaping in XML. Also you will not get the real original string but something slightly different (e.g. including an XML schema link). – 6502 Mar 07 '21 at 07:27
11

One way to do this would be to re-request the page using XMLHttpRequest, then you'll get the entire page verbatim from the web server.

Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
2

For IE you can also use:

document.all[0].outerHTML
L8R
  • 401
  • 5
  • 21
DmitryK
  • 5,542
  • 1
  • 22
  • 32
  • Surprised this isn't marked as the answer. This works perfectly! The only thing is it only gets static HTML (doesn't retrieve anything javascript-related). – L8R Oct 26 '22 at 21:34
2

Provided that

  • true html source code is wanted (not current DOM serization)
  • and that the page was loaded using GET method,

the page source can be re-downloaded:

fetch(document.location.href)
    .then(response => response.text())
    .then(pageSource => /* ... */)
czerny
  • 15,090
  • 14
  • 68
  • 96
  • 1
    That is unreliable because there is no guarentee that the server will serve the same content next time. – Szczepan Hołyszewski Sep 23 '17 at 02:43
  • @SzczepanHołyszewski Given that the REST protocol is defined as [stateless](https://stackoverflow.com/q/34130036/9063935), as long as you send the same headers in the ajax request as the browser did, then I would be confident the server would send the same response. – dwb Sep 19 '20 at 20:42
  • 1
    @dantechguy What are you talking about? There is nothing in the OP about REST. Whether an endpoint is a REST one depends on the server. The `fetch` API is typically used by client-side JS to talk to REST endpoints, but using the `fetch` API on a non-REST endpoint doesn't magically turn it into a REST one. But even if we talk REST, statelessness is irrelevant. Two identical REST GET requests can return different data if the resource was actually modified between the requests, or your permission to access the resource was revoked, or for a number of other reasons. – Szczepan Hołyszewski Sep 23 '20 at 13:24
  • You make this a bit more reliable by at least adding an `Accept` header similar to that of the browser. But yeah, this approach is not generally reliable. – mindplay.dk Sep 01 '21 at 11:14
  • This worked for me! this youtube url has timedtext (transcription) in 'view page source' and could only retrieve this by fetching the url again. https://www.youtube.com/watch?v=LA-LMRFhzaw&ab_channel=jordifieke – Wim den Herder May 28 '22 at 19:05