How to create Document objects with JavaScript

Question

Basically that's the question, how is one supposed to construct a Document object from a string of HTML dynamically in javascript?

It's a real Document object. Not sure what you mean by "mere xml/html structure"... — Šime Vidas, Nov 22 '11 at 14:22
If you look at the [specification of the `send()` method](http://www.w3.org/TR/XMLHttpRequest2/#the-send-method), you'll notice that strings are covered by the "DOMString" case. Therefore, the "Document" case cannot mean a string, it's an object that implements the `Document` interface specified in the DOM standard. — Šime Vidas, Nov 22 '11 at 14:27
Yes, I read it, just wanted to clarify, maybe you know any examples of XMLHttpRequest used to send Document object? Side-question still stays though - maybe it's not that wide-spread usage, but how do you construct Document object? — jayarjo, Nov 22 '11 at 14:39
Now that we cleared that up, I recommend that you edit your question so that the side-question becomes the main question - "How to create Document objects with JavaScript". — Šime Vidas, Nov 22 '11 at 14:54
If you don't get a good answer in the next 45 hours, remind me to set a bounty on this... — Šime Vidas, Nov 22 '11 at 16:46

score 33 · Accepted Answer · answered Nov 22 '11 at 17:54

33

There are two methods defined in specifications, createDocument from DOM Core Level 2 and createHTMLDocument from HTML5. The former creates an XML document (including XHTML), the latter creates a HTML document. Both reside, as functions, on the DOMImplementation interface.

var impl    = document.implementation,
    xmlDoc  = impl.createDocument(namespaceURI, qualifiedNameStr, documentType),
    htmlDoc = impl.createHTMLDocument(title);

In reality, these methods are rather young and only implemented in recent browser releases. According to http://quirksmode.org and MDN, the following browsers support createHTMLDocument:

Chrome 4
Opera 10
Firefox 4
Internet Explorer 9
Safari 4

Interestingly enough, you can (kind of) create a HTML document in older versions of Internet Explorer, using ActiveXObject:

var htmlDoc = new ActiveXObject("htmlfile");

The resulting object will be a new document, which can be manipulated just like any other document.

answered Nov 22 '11 at 17:54

Andy E

338,112
86
474
445

2

I note you don't include working code. The only methods I've seen (e.g. on [MDN](https://developer.mozilla.org/en-US/docs/DOM/DOMParser)) rely on either built–in browser support for `DOMParser.prototype.parseFromString` or on setting the `innerHTML` property of the new document's HTML element. However, [according to MSDN](http://msdn.microsoft.com/en-us/library/ie/ms533897%28v=vs.85%29.aspx) (and shown in testing) the innerHTML property of HTML elements (and a number of others) is **read only** and can't be set in IE up to and including IE 9. – RobG Aug 29 '12 at 03:59
@RobG: there's `document.open()`, `document.write()` and `document.close()`. They all work in any browser including IE6 and lower. The only downside is that all the methods for IE6-8 parse and execute JavaScript, which is probably not desirable (it wasn't in [my case](http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer)). – Andy E Aug 29 '12 at 09:46
1

All good information, but handy to see code. An approach using `document.open` etc. seems to work back to IE 6 and is a useful alternative, however it doesn't work in older Firefox and others whereas the `stringToXMLDoc` function I posted does, as does the `stringToHTMLDoc` function. – RobG Aug 29 '12 at 13:18
1

@RobG: Sure, it's always handy to see code, but the question wasn't *"How to create document objects with JavaScript and populate them with HTML?"*, so code like that may be considered unnecessary bloat for an answer, especially considering the OP might have intended to construct the entire document using DOM creation methods. Otherwise, I would genuinely consider adding the code to my answer. I'd also like to confidently guess that the number of worldwide users with a browser that doesn't support `document.open` is minuscule (Firefox supported it in 1.5), probably not enough to worry about. – Andy E Aug 29 '12 at 13:37

ecmanaut · Answer 2 · 2012-11-04T22:01:13.100

Assuming you are trying to create a fully parsed Document object from a string of markup and a content-type you also happen to know (maybe because you got the html from an xmlhttprequest, and thus got the content-type in its Content-Type http header; probably usually text/html) – it should be this easy:

var doc = (new DOMParser).parseFromString(markup, mime_type);

in an ideal future world where browser DOMParser implementations are as strong and competent as their document rendering is – maybe that's a good pipe dream requirement for future HTML6 standards efforts. It turns out no current browsers do, though.

You probably have the easier (but still messy) problem of having a string of html you want to get a fully parsed Document object for. Here is another take on how to do that, which also ought to work in all browsers – first you make a HTML Document object:

var doc = document.implementation.createHTMLDocument('');

and then populate it with your html fragment:

doc.open();
doc.write(html);
doc.close();

Now you should have a fully parsed DOM in doc, which you can run alert(doc.title) on, slice with css selectors like doc.querySelectorAll('p') or ditto XPath using doc.evaluate.

This actually works in modern WebKit browsers like Chrome and Safari (I just tested in Chrome 22 and Safari 6 respectively) – here is an example that takes the current page's source code, recreates it in a new document variable src, reads out its title, overwrites it with a html quoted version of the same source code and shows the result in an iframe: http://codepen.io/johan/full/KLIeE

Sadly, I don't think any other contemporary browsers have quite as solid implementations yet.

Unfortunately, the last method fails in Opera (12.12): `doc.open()` `write` / `close` affects the current document instead of the `doc` document. IE6+, Firefox 4+, Chrome 1+ and Safari 3.2+ correctly supports this method. I've already submitted a bug report to Opera. — Rob W, Jan 06 '13 at 21:35
Thank you! I didn't check, and it seems an embarrassing enough bug to fix (or at least we can hope for the best) somewhat swiftly. :-) — ecmanaut, Jan 06 '13 at 21:53

Chris Baker · Answer 3 · 2011-11-22T18:00:52.663

5

Per the spec (doc), one may use the createHTMLDocument method of DOMImplementation, accessible via document.implementation as follows:

var doc = document.implementation.createHTMLDocument('My title');  
var body = document.createElementNS('http://www.w3.org/1999/xhtml', 'body'); 
doc.documentElement.appendChild(body);
// and so on

jsFiddle: http://jsfiddle.net/9Fh7R/
MDN document for DOMImplementation: https://developer.mozilla.org/en/DOM/document.implementation
MDN document for DOMImplementation.createHTMLDocument: https://developer.mozilla.org/En/DOM/DOMImplementation.createHTMLDocument

edited Nov 22 '11 at 18:00

answered Nov 22 '11 at 17:52

Chris Baker

49,926
12
96
115

Note that the MDN code fails in IE 9 and lower since it doesn't support setting the HTML element's innerHTML property (it's read only). If you insert a complete HTML document (including a head element) into a BODY element, you are depending on browser error correction of what becomes invalid markup at the moment it is assigned. – RobG Aug 29 '12 at 04:05

RobG · Answer 4 · 2012-08-29T07:27:32.400

The following works in most common browsers, but not some. This is how simple it should be (but isn't):

// Fails if UA doesn't support parseFromString for text/html (e.g. IE)
function htmlToDoc(markup) {
  var parser = new DOMParser();
  return parser.parseFromString(markup, "text/html");
}

var htmlString = "<title>foo bar</title><div>a div</div>";
alert(htmlToDoc(htmlString).title);

To account for user agent vagaries, the following may be better (please note attribution):

/*
 * DOMParser HTML extension
 * 2012-02-02
 *
 * By Eli Grey, http://eligrey.com
 * Public domain.
 * NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
 *
 * Modified to work with IE 9 by RobG
 * 2012-08-29
 *
 * Notes:
 *
 *  1. Supplied markup should be avalid HTML document with or without HTML tags and
 *     no DOCTYPE (DOCTYPE support can be added, I just didn't do it)
 *
 *  2. Host method used where host supports text/html
 */

/*! @source https://gist.github.com/1129031 */
/*! @source https://developer.mozilla.org/en-US/docs/DOM/DOMParser */

/*global document, DOMParser*/

(function(DOMParser) {
    "use strict";

    var DOMParser_proto;
    var real_parseFromString;
    var textHTML;         // Flag for text/html support
    var textXML;          // Flag for text/xml support
    var htmlElInnerHTML;  // Flag for support for setting html element's innerHTML

    // Stop here if DOMParser not defined
    if (!DOMParser) return;

    // Firefox, Opera and IE throw errors on unsupported types
    try {
        // WebKit returns null on unsupported types
        textHTML = !!(new DOMParser).parseFromString('', 'text/html');

    } catch (er) {
      textHTML = false;
    }

    // If text/html supported, don't need to do anything.
    if (textHTML) return;

    // Next try setting innerHTML of a created document
    // IE 9 and lower will throw an error (can't set innerHTML of its HTML element)
    try {
      var doc = document.implementation.createHTMLDocument('');
      doc.documentElement.innerHTML = '<title></title><div></div>';
      htmlElInnerHTML = true;

    } catch (er) {
      htmlElInnerHTML = false;
    }

    // If if that failed, try text/xml
    if (!htmlElInnerHTML) {

        try {
            textXML = !!(new DOMParser).parseFromString('', 'text/xml');

        } catch (er) {
            textHTML = false;
        }
    }

    // Mess with DOMParser.prototype (less than optimal...) if one of the above worked
    // Assume can write to the prototype, if not, make this a stand alone function
    if (DOMParser.prototype && (htmlElInnerHTML || textXML)) { 
        DOMParser_proto = DOMParser.prototype;
        real_parseFromString = DOMParser_proto.parseFromString;

        DOMParser_proto.parseFromString = function (markup, type) {

            // Only do this if type is text/html
            if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
                var doc, doc_el, first_el;

                // Use innerHTML if supported
                if (htmlElInnerHTML) {
                    doc = document.implementation.createHTMLDocument("");
                    doc_el = doc.documentElement;
                    doc_el.innerHTML = markup;
                    first_el = doc_el.firstElementChild;

                // Otherwise use XML method
                } else if (textXML) {

                    // Make sure markup is wrapped in HTML tags
                    // Should probably allow for a DOCTYPE
                    if (!(/^<html.*html>$/i.test(markup))) {
                        markup = '<html>' + markup + '<\/html>'; 
                    }
                    doc = (new DOMParser).parseFromString(markup, 'text/xml');
                    doc_el = doc.documentElement;
                    first_el = doc_el.firstElementChild;
                }

                // RG: I don't understand the point of this, I'll leave it here though 
                //     In IE, doc_el is the HTML element and first_el is the HEAD.
                //
                // Is this an entire document or a fragment?
                if (doc_el.childElementCount == 1 && first_el.localName.toLowerCase() == 'html') {
                    doc.replaceChild(first_el, doc_el);
                }

                return doc;

            // If not text/html, send as-is to host method
            } else {
                return real_parseFromString.apply(this, arguments);
            }
        };
    }
}(DOMParser));

// Now some test code
var htmlString = '<html><head><title>foo bar</title></head><body><div>a div</div></body></html>';
var dp = new DOMParser();
var doc = dp.parseFromString(htmlString, 'text/html');

// Treat as an XML document and only use DOM Core methods
alert(doc.documentElement.getElementsByTagName('title')[0].childNodes[0].data);

Don't be put off by the amount of code, there are a lot of comments, it can be shortened quite a bit but becomes less readable.

Oh, and if the markup is valid XML, life is much simpler:

var stringToXMLDoc = (function(global) {

  // W3C DOMParser support
  if (global.DOMParser) {
    return function (text) {
      var parser = new global.DOMParser();
      return parser.parseFromString(text,"application/xml");
    }

  // MS ActiveXObject support
  } else {
    return function (text) {
      var xmlDoc;

      // Can't assume support and can't test, so try..catch
      try {
        xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
        xmlDoc.async="false";
        xmlDoc.loadXML(text);
      } catch (e){}
      return xmlDoc;
    }
  }
}(this));


var doc = stringToXMLDoc('<books><book title="foo"/><book title="bar"/><book title="baz"/></books>');
alert(
  doc.getElementsByTagName('book')[2].getAttribute('title')
);

Like I said in reply to your comment on my answer, IE6-9 can be coerced into parsing a document using a combination of the `open()`, `write()` and `close()` members of `document`, and this can be done in conjunction with `new ActiveXObject("htmlfile")`. If you're not worried about script execution in IE 6-8, this is a much shorter/easier approach. Your modification to Eli Grey's code could also use these methods and cut the size down significantly. — Andy E, Aug 29 '12 at 10:04

Neil F · Answer 5 · 2014-09-27T23:56:27.810

An updated answer for 2014, as the DOMparser has evolved. This works in all current browsers I can find, and should work too in earlier versions of IE, using ecManaut's document.implementation.createHTMLDocument('') approach above.

Essentially, IE, Opera, Firefox can all parse as "text/html". Safari parses as "text/xml".

Beware of intolerant XML parsing, though. The Safari parse will break down at non-breaking spaces and other HTML characters (French/German accents) designated with ampersands. Rather than handle each character separately, the code below replaces all ampersands with meaningless character string "j!J!". This string can subsequently be re-rendered as an ampersand when displaying the results in a browser (simpler, I have found, than trying to handle ampersands in "false" XML parsing).

function parseHTML(sText) {
try {

    console.log("Domparser: " + typeof window.DOMParser);

    if (typeof window.DOMParser !=null) {
        // modern IE, Firefox, Opera  parse text/html
        var parser = new DOMParser();
        var doc = parser.parseFromString(sText, "text/html");
        if (doc != null) {
            console.log("parsed as HTML");
            return doc

        }
        else {

            //replace ampersands with harmless character string to avoid XML parsing issues
            sText = sText.replace(/&/gi, "j!J!");
            //safari parses as text/xml
            var doc = parser.parseFromString(sText, "text/xml");
            console.log("parsed as XML");
            return doc;
        }

    } 
    else  {
        // older IE 
        doc= document.implementation.createHTMLDocument('');
        doc.write(sText);           
        doc.close;
        return doc; 
    }
} catch (err) {
    alert("Error parsing html:\n" + err.message);
}
}

How to create Document objects with JavaScript

5 Answers5

Linked

Related