How to parse HTML from JavaScript in Firefox?

Question

What is the best way to parse (get a DOM tree of) a HTML result of XmlHttpRequest in Firefox?

EDIT:

I do not have the DOM tree, I want to acquire it.

XmlHttpRequest's "responseXML" works only when the result is actual XML, so I have only responseText to work with.

~~The innerHTML hack doesn't seem to work with a complete HTML document (in <html></html>).~~ - turns out it works fine.

Browsers parsing plain html code since they are exist. But that's so sad that there is no simple, standard way that can invoke the browser's parser to make a HTMLDocument object from a html string... — Calmarius, Apr 25 '11 at 18:32

score 22 · Accepted Answer · answered May 20 '09 at 17:21

22

innerHTML should work just fine, e.g.

// This would be after the Ajax request:
var myHTML = XHR.responseText;
var tempDiv = document.createElement('div');
tempDiv.innerHTML = myHTML.replace(/<script(.|\s)*?\/script>/g, '');

// tempDiv now has a DOM structure:
tempDiv.childNodes;
tempDiv.getElementsByTagName('a'); // etc. etc.

answered May 20 '09 at 17:21

James

109,676
31
162
175

Looks like it's the best I can do. Thanks for the tip about – hmp May 20 '09 at 18:38
If you're worried about – thomasrutter Oct 12 '10 at 06:01
2

According to this page: http://bytes.com/topic/javascript/answers/513633-innerhtml-script-tag - you don't need to worry about script blocks being executed when added via innerHTML: "Script blocks inserted via innerHTML don't get executed in any browser other than NS6" - though that was written in 2006. – thomasrutter Oct 12 '10 at 06:03
2

What if the to be parsed document myHTML was a complete HTML document starting with ...? It wouldn't make much sense to have it as the innerHTML of a div, would it? Even if current browsers are able to ignore some of the markup to get a suitable replacement for the div's innerHTML, it doesn't sound as a clean solution to me. – Marc Jun 16 '11 at 07:02
This fails for "test" , it only does it correctly for Div – Akash Kava Mar 31 '12 at 12:12

score 3 · Answer 2 · answered Oct 10 '13 at 09:10

You can use the DOMParser to parse HTML - even tag soup:

var parser = new DOMParser()
parser.parseFromString('<!DOCTYPE html><html><head><title>hi</title></head><body><p>hello<b>world</b></p>', 'text/html')

I don't know if it handles partial table markup well, but it should create the same DOM the browser itself does for pretty much any markup.

score 3 · Answer 3 · answered Jan 07 '12 at 08:27

At least for newer Firefox versions, an easier way is or will soon be available.

https://developer.mozilla.org/en/HTML_in_XMLHttpRequest indicates that starting from FF11 it will be possible to ask for a DOM directly from the XHR by setting the responseType attribute to "document". At that point, the HTML will be parsed and the DOM stuck into responseXML as for an XML document.

score 1 · Answer 4 · answered May 20 '09 at 16:19

Loop up the responseXML property of the XMLHttpRequest object. Furthermore, if you use innerHTML to append the responseText of an HTML-formatted response, the browser will parse the text and assemble it within the DOM all before even appending it into the document flow.

score 1 · Answer 5 · answered May 20 '09 at 16:42

If your data is XHTML, so it's valid XML, then DOMParser (Mozilla) or loadXML (IE) may help. If not, I can't think of anything better than stripping the and and then passing it to a 's innerHtml.

See 21.1.3 in Flanagan's Javascript guide (5th edition).

Colin

How to parse HTML from JavaScript in Firefox?

5 Answers5

Linked