4

I'd like to parse a string and make DOM tree out of it. I decided to use documentFragment API and I did this so far:

var htmlString ="Some really really complicated html string that only can be parsed by a real browser!";
var fragment = document.createDocumentFragment('div');
var tempDiv = document.createElement('div');
fragment.appendChild(tempDiv);
tempDiv.innerHTML = htmlString;
console.log(tempDiv);

But the problem is that this script causes my browser (Chrome specifically) to send actual HTTP requests! what do I mean? take this as example:

var htmlString ='<img src="somewhere/odd/on/the/internet" alt="alt?" />';
var fragment = document.createDocumentFragment('div');
var tempDiv = document.createElement('div');
fragment.appendChild(tempDiv);
tempDiv.innerHTML = htmlString;
console.log(tempDiv);

Which leads to:

Chrome error

Is there any workarounds for this? or any other better idea to parse HTML-String?

Community
  • 1
  • 1
Sepehr
  • 2,051
  • 19
  • 29

4 Answers4

3

Well you are appending the element to the page, of course the browser is going to fetch the content.

You can look into using DOMParser

var htmlString ='<img src="somewhere/odd/on/the/internet" alt="alt?" />';
var parser = new DOMParser();
var doc = parser.parseFromString(htmlString , "text/html");

There is code there on the MDN Doc page to support browsers that do not native support for it.

epascarello
  • 204,599
  • 20
  • 195
  • 236
  • Thanks for the answer. The reason that I'm not using `DOMParser` is that, it fails on parsing complicated HTML-Strings like Google's home page source. don't take my word, try it yourself and see how it fails on parsing it! – Sepehr Oct 05 '12 at 13:56
1

I've found answer of my question here on stackoverflow, this answer. the answer consists of a piece of code which parses HTML using native browser functionality but in a semi-sandboxed environment which doesn't send HTTP requests. hope it helps others as well.

Community
  • 1
  • 1
Sepehr
  • 2,051
  • 19
  • 29
  • Just a little warning. Don't rely on that code working in all cases. For example, if the html input is ``, the `x.jpg` file will be fetched. – Alohci Oct 07 '12 at 23:39
0

I took a modified approach to the accepted answer's linked answer, as I don't like the idea of creating an iframe, processing the string through a BUNCH of regular expressions, and then putting that into the DOM.

I needed to preprocess some HTML coming in from an ajax request (this particular HTML has images with relative paths, and the page making the ajax request is not in the same directory as the HTML) and make the path to resources an absolute path instead.

My code looks something like this:

var dataSrcStr = data.replace(/src=/g,'data-src=');
var myContainer = document.getElementById('mycontainer');
myContainer.innerHTML = dataSrcStr;
var imgs = myContainer.querySelectorAll('img');
for(i=0,ii=imgs.length;i<ii;i++){
  imgs[i].src = 'prepended/path/to/img/'+imgs[i].data-src;
  delete imgs[i]['data-src'];
}

Obviously if there's some clear text with src= in it, you'll be replacing that, but it won't be the case for my content, as I control it as well.

This offers me a quicker solution than the linked answer or using a DOMParser, while still adding elements to the DOM to be able to access the elements programmatically.

gcochard
  • 11,408
  • 1
  • 26
  • 41
0

Try this. Works for complex html too. Anything your browser can display, this can parse.

var htmlString = "...";
var newDoc = document.implementation.createHTMLDocument('newDoc');      
newDoc.documentElement.innerHTML = htmlString;
iPherian
  • 908
  • 15
  • 38