Basically that's the question, how is one supposed to construct a Document object from a string of HTML dynamically in javascript?

- 71,580
- 16
- 111
- 150

- 16,124
- 24
- 94
- 138
-
It's a real Document object. Not sure what you mean by "mere xml/html structure"... – Šime Vidas Nov 22 '11 at 14:22
-
I mean text, sequence of tags. – jayarjo Nov 22 '11 at 14:24
-
1If you look at the [specification of the `send()` method](http://www.w3.org/TR/XMLHttpRequest2/#the-send-method), you'll notice that strings are covered by the "DOMString" case. Therefore, the "Document" case cannot mean a string, it's an object that implements the `Document` interface specified in the DOM standard. – Šime Vidas Nov 22 '11 at 14:27
-
Yes, I read it, just wanted to clarify, maybe you know any examples of XMLHttpRequest used to send Document object? Side-question still stays though - maybe it's not that wide-spread usage, but how do you construct Document object? – jayarjo Nov 22 '11 at 14:39
-
Now that we cleared that up, I recommend that you edit your question so that the side-question becomes the main question - "How to create Document objects with JavaScript". – Šime Vidas Nov 22 '11 at 14:54
-
If you don't get a good answer in the next 45 hours, remind me to set a bounty on this... – Šime Vidas Nov 22 '11 at 16:46
5 Answers
There are two methods defined in specifications, createDocument
from DOM Core Level 2 and createHTMLDocument
from HTML5. The former creates an XML document (including XHTML), the latter creates a HTML document. Both reside, as functions, on the DOMImplementation
interface.
var impl = document.implementation,
xmlDoc = impl.createDocument(namespaceURI, qualifiedNameStr, documentType),
htmlDoc = impl.createHTMLDocument(title);
In reality, these methods are rather young and only implemented in recent browser releases. According to http://quirksmode.org and MDN, the following browsers support createHTMLDocument
:
- Chrome 4
- Opera 10
- Firefox 4
- Internet Explorer 9
- Safari 4
Interestingly enough, you can (kind of) create a HTML document in older versions of Internet Explorer, using ActiveXObject
:
var htmlDoc = new ActiveXObject("htmlfile");
The resulting object will be a new document, which can be manipulated just like any other document.

- 338,112
- 86
- 474
- 445
-
2I note you don't include working code. The only methods I've seen (e.g. on [MDN](https://developer.mozilla.org/en-US/docs/DOM/DOMParser)) rely on either built–in browser support for `DOMParser.prototype.parseFromString` or on setting the `innerHTML` property of the new document's HTML element. However, [according to MSDN](http://msdn.microsoft.com/en-us/library/ie/ms533897%28v=vs.85%29.aspx) (and shown in testing) the innerHTML property of HTML elements (and a number of others) is **read only** and can't be set in IE up to and including IE 9. – RobG Aug 29 '12 at 03:59
-
@RobG: there's `document.open()`, `document.write()` and `document.close()`. They all work in any browser including IE6 and lower. The only downside is that all the methods for IE6-8 parse and execute JavaScript, which is probably not desirable (it wasn't in [my case](http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer)). – Andy E Aug 29 '12 at 09:46
-
1All good information, but handy to see code. An approach using `document.open` etc. seems to work back to IE 6 and is a useful alternative, however it doesn't work in older Firefox and others whereas the `stringToXMLDoc` function I posted does, as does the `stringToHTMLDoc` function. – RobG Aug 29 '12 at 13:18
-
1@RobG: Sure, it's always handy to see code, but the question wasn't *"How to create document objects with JavaScript and populate them with HTML?"*, so code like that may be considered unnecessary bloat for an answer, especially considering the OP might have intended to construct the entire document using DOM creation methods. Otherwise, I would genuinely consider adding the code to my answer. I'd also like to confidently guess that the number of worldwide users with a browser that doesn't support `document.open` is minuscule (Firefox supported it in 1.5), probably not enough to worry about. – Andy E Aug 29 '12 at 13:37
Assuming you are trying to create a fully parsed Document object from a string of markup and a content-type you also happen to know (maybe because you got the html from an xmlhttprequest, and thus got the content-type in its Content-Type
http header; probably usually text/html
) – it should be this easy:
var doc = (new DOMParser).parseFromString(markup, mime_type);
in an ideal future world where browser DOMParser
implementations are as strong and competent as their document rendering is – maybe that's a good pipe dream requirement for future HTML6
standards efforts. It turns out no current browsers do, though.
You probably have the easier (but still messy) problem of having a string of html you want to get a fully parsed Document
object for. Here is another take on how to do that, which also ought to work in all browsers – first you make a HTML Document
object:
var doc = document.implementation.createHTMLDocument('');
and then populate it with your html fragment:
doc.open();
doc.write(html);
doc.close();
Now you should have a fully parsed DOM in doc, which you can run alert(doc.title)
on, slice with css selectors like doc.querySelectorAll('p')
or ditto XPath using doc.evaluate
.
This actually works in modern WebKit browsers like Chrome and Safari (I just tested in Chrome 22 and Safari 6 respectively) – here is an example that takes the current page's source code, recreates it in a new document variable src
, reads out its title, overwrites it with a html quoted version of the same source code and shows the result in an iframe: http://codepen.io/johan/full/KLIeE
Sadly, I don't think any other contemporary browsers have quite as solid implementations yet.

- 5,030
- 2
- 44
- 66
-
Unfortunately, the last method fails in Opera (12.12): `doc.open()` `write` / `close` affects the current document instead of the `doc` document. IE6+, Firefox 4+, Chrome 1+ and Safari 3.2+ correctly supports this method. I've already submitted a bug report to Opera. – Rob W Jan 06 '13 at 21:35
-
1Thank you! I didn't check, and it seems an embarrassing enough bug to fix (or at least we can hope for the best) somewhat swiftly. :-) – ecmanaut Jan 06 '13 at 21:53
Per the spec (doc), one may use the createHTMLDocument
method of DOMImplementation
, accessible via document.implementation
as follows:
var doc = document.implementation.createHTMLDocument('My title');
var body = document.createElementNS('http://www.w3.org/1999/xhtml', 'body');
doc.documentElement.appendChild(body);
// and so on
- jsFiddle: http://jsfiddle.net/9Fh7R/
- MDN document for
DOMImplementation
: https://developer.mozilla.org/en/DOM/document.implementation - MDN document for
DOMImplementation.createHTMLDocument
: https://developer.mozilla.org/En/DOM/DOMImplementation.createHTMLDocument

- 49,926
- 12
- 96
- 115
-
Note that the MDN code fails in IE 9 and lower since it doesn't support setting the HTML element's innerHTML property (it's read only). If you insert a complete HTML document (including a head element) into a BODY element, you are depending on browser error correction of what becomes invalid markup at the moment it is assigned. – RobG Aug 29 '12 at 04:05
The following works in most common browsers, but not some. This is how simple it should be (but isn't):
// Fails if UA doesn't support parseFromString for text/html (e.g. IE)
function htmlToDoc(markup) {
var parser = new DOMParser();
return parser.parseFromString(markup, "text/html");
}
var htmlString = "<title>foo bar</title><div>a div</div>";
alert(htmlToDoc(htmlString).title);
To account for user agent vagaries, the following may be better (please note attribution):
/*
* DOMParser HTML extension
* 2012-02-02
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*
* Modified to work with IE 9 by RobG
* 2012-08-29
*
* Notes:
*
* 1. Supplied markup should be avalid HTML document with or without HTML tags and
* no DOCTYPE (DOCTYPE support can be added, I just didn't do it)
*
* 2. Host method used where host supports text/html
*/
/*! @source https://gist.github.com/1129031 */
/*! @source https://developer.mozilla.org/en-US/docs/DOM/DOMParser */
/*global document, DOMParser*/
(function(DOMParser) {
"use strict";
var DOMParser_proto;
var real_parseFromString;
var textHTML; // Flag for text/html support
var textXML; // Flag for text/xml support
var htmlElInnerHTML; // Flag for support for setting html element's innerHTML
// Stop here if DOMParser not defined
if (!DOMParser) return;
// Firefox, Opera and IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
textHTML = !!(new DOMParser).parseFromString('', 'text/html');
} catch (er) {
textHTML = false;
}
// If text/html supported, don't need to do anything.
if (textHTML) return;
// Next try setting innerHTML of a created document
// IE 9 and lower will throw an error (can't set innerHTML of its HTML element)
try {
var doc = document.implementation.createHTMLDocument('');
doc.documentElement.innerHTML = '<title></title><div></div>';
htmlElInnerHTML = true;
} catch (er) {
htmlElInnerHTML = false;
}
// If if that failed, try text/xml
if (!htmlElInnerHTML) {
try {
textXML = !!(new DOMParser).parseFromString('', 'text/xml');
} catch (er) {
textHTML = false;
}
}
// Mess with DOMParser.prototype (less than optimal...) if one of the above worked
// Assume can write to the prototype, if not, make this a stand alone function
if (DOMParser.prototype && (htmlElInnerHTML || textXML)) {
DOMParser_proto = DOMParser.prototype;
real_parseFromString = DOMParser_proto.parseFromString;
DOMParser_proto.parseFromString = function (markup, type) {
// Only do this if type is text/html
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var doc, doc_el, first_el;
// Use innerHTML if supported
if (htmlElInnerHTML) {
doc = document.implementation.createHTMLDocument("");
doc_el = doc.documentElement;
doc_el.innerHTML = markup;
first_el = doc_el.firstElementChild;
// Otherwise use XML method
} else if (textXML) {
// Make sure markup is wrapped in HTML tags
// Should probably allow for a DOCTYPE
if (!(/^<html.*html>$/i.test(markup))) {
markup = '<html>' + markup + '<\/html>';
}
doc = (new DOMParser).parseFromString(markup, 'text/xml');
doc_el = doc.documentElement;
first_el = doc_el.firstElementChild;
}
// RG: I don't understand the point of this, I'll leave it here though
// In IE, doc_el is the HTML element and first_el is the HEAD.
//
// Is this an entire document or a fragment?
if (doc_el.childElementCount == 1 && first_el.localName.toLowerCase() == 'html') {
doc.replaceChild(first_el, doc_el);
}
return doc;
// If not text/html, send as-is to host method
} else {
return real_parseFromString.apply(this, arguments);
}
};
}
}(DOMParser));
// Now some test code
var htmlString = '<html><head><title>foo bar</title></head><body><div>a div</div></body></html>';
var dp = new DOMParser();
var doc = dp.parseFromString(htmlString, 'text/html');
// Treat as an XML document and only use DOM Core methods
alert(doc.documentElement.getElementsByTagName('title')[0].childNodes[0].data);
Don't be put off by the amount of code, there are a lot of comments, it can be shortened quite a bit but becomes less readable.
Oh, and if the markup is valid XML, life is much simpler:
var stringToXMLDoc = (function(global) {
// W3C DOMParser support
if (global.DOMParser) {
return function (text) {
var parser = new global.DOMParser();
return parser.parseFromString(text,"application/xml");
}
// MS ActiveXObject support
} else {
return function (text) {
var xmlDoc;
// Can't assume support and can't test, so try..catch
try {
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async="false";
xmlDoc.loadXML(text);
} catch (e){}
return xmlDoc;
}
}
}(this));
var doc = stringToXMLDoc('<books><book title="foo"/><book title="bar"/><book title="baz"/></books>');
alert(
doc.getElementsByTagName('book')[2].getAttribute('title')
);

- 142,382
- 31
- 172
- 209
-
1Like I said in reply to your comment on my answer, IE6-9 can be coerced into parsing a document using a combination of the `open()`, `write()` and `close()` members of `document`, and this can be done in conjunction with `new ActiveXObject("htmlfile")`. If you're not worried about script execution in IE 6-8, this is a much shorter/easier approach. Your modification to Eli Grey's code could also use these methods and cut the size down significantly. – Andy E Aug 29 '12 at 10:04
An updated answer for 2014, as the DOMparser has evolved. This works in all current browsers I can find, and should work too in earlier versions of IE, using ecManaut's document.implementation.createHTMLDocument('') approach above.
Essentially, IE, Opera, Firefox can all parse as "text/html". Safari parses as "text/xml".
Beware of intolerant XML parsing, though. The Safari parse will break down at non-breaking spaces and other HTML characters (French/German accents) designated with ampersands. Rather than handle each character separately, the code below replaces all ampersands with meaningless character string "j!J!". This string can subsequently be re-rendered as an ampersand when displaying the results in a browser (simpler, I have found, than trying to handle ampersands in "false" XML parsing).
function parseHTML(sText) {
try {
console.log("Domparser: " + typeof window.DOMParser);
if (typeof window.DOMParser !=null) {
// modern IE, Firefox, Opera parse text/html
var parser = new DOMParser();
var doc = parser.parseFromString(sText, "text/html");
if (doc != null) {
console.log("parsed as HTML");
return doc
}
else {
//replace ampersands with harmless character string to avoid XML parsing issues
sText = sText.replace(/&/gi, "j!J!");
//safari parses as text/xml
var doc = parser.parseFromString(sText, "text/xml");
console.log("parsed as XML");
return doc;
}
}
else {
// older IE
doc= document.implementation.createHTMLDocument('');
doc.write(sText);
doc.close;
return doc;
}
} catch (err) {
alert("Error parsing html:\n" + err.message);
}
}

- 41
- 2