12

As this answer indicates, a good way to parse HTML in JavaScript is to simply re-use the browser's HTML-parsing capabilities like so:

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
// process 'el' as desired

However, this triggers loading extra pages for certain HTML strings, for example:

var foo = document.createElement('div')
foo.innerHTML = '<img src="http://example.com/img.png">';

As soon as this example is run, the browser attempts to load the page:

enter image description here

How might I process HTML from JavaScript without this behavior?

Community
  • 1
  • 1
Claudiu
  • 224,032
  • 165
  • 485
  • 680

2 Answers2

1

I don't know if there is a perfect solution for this, but since this is merely for processing, you can before assigning innerHTMl replace all src attributes to be notSrc="xyz.com", this way it wont be loaded, and if you need them later in processing you can account for this. The browser mainly will load images, scripts, and css files, this will fix the first 2, the css can be done by replacing the href attribute.

MoustafaS
  • 1,991
  • 11
  • 20
1

If you want to parse HTML response without loading any unnecessary resources like images or scripts inside, use DOMImplementation’s createHTMLDocument() to create new document which is not connected to the current one parsed by the browser and behaves as well as normal document.

Waqas Amjad
  • 195
  • 1
  • 13