Make parseFromString() parse without validation

Question

I use parseFromString() to create elements. Each element is individual and should be inserted into the DOM later.

This works fine, except for this string:

<tr> <td>a</td> </tr> <div>X</div>

How to parse the <tr> and <div> so that I have a list of two elements?

Update

I know that I could parse this easily:

<table><tr> <td>a</td> </tr></table> <div>X</div>

But in this case I really would like to parse <tr>...</tr> <div>..</div>.

Related htmx issue: #469

In this very specific case, the only way I see is `const els = input.split(' ')`, then insert them as-is (i.e. don't parse at all). — , Apr 29 '21 at 08:16
@ChrisG Thank you for your comment. Unfortunately there can be spaces everywhere. — guettli, Apr 29 '21 at 08:23
Does this answer your question? [Creating a new DOM element from an HTML string using built-in DOM methods or Prototype](https://stackoverflow.com/questions/494143/creating-a-new-dom-element-from-an-html-string-using-built-in-dom-methods-or-pro) — Peter B, Apr 29 '21 at 08:30
@PeterB It's not that simple I'm afraid; the main issue is that the browser discards the orphaned `` because it's invalid outside a table. — , Apr 29 '21 at 08:31
^ That, and the use of `parseFromString` isn invalid in this case. Otherwise, I'd have hammered it already :D — Cerbrus, Apr 29 '21 at 08:32
@guettli please take a look at my solution when you have a chance. Using both the `XMLDocument` and `HTMLDocument` object types together, we are able to build an `HTMLDocument` object that works exactly as you are looking for. I created this as a prototype method on the `DOMParser.prototype` object so you can use it as you normally would. In its current state, it does not require a second parameter and always returns an `HTMLDocument` for whatever string contents you pass in. — Brandon McConnell, May 04 '21 at 15:31

score 1 · Accepted Answer · edited May 07 '21 at 11:08

You can achieve without having to use document.createElement() as I've seen in some of the comments here, and without wrapping everything in some parent element like <template> as I also see in some solutions.

To achieve this, we must first understand how the parseFromString() method works. From the first line of the docs, we can see that…

The parseFromString() method of the DOMParser interface parses a string containing either HTML or XML, returning an HTMLDocument or an XMLDocument.

Here are the requirements for both document types:

HTMLDocument (text/html) :: Must be valid HTML (where <tr> must be the descendant of a <table> element
XMLDocument (text/xml) :: Must have one parent element; cannot have multiple top-level elements

The main issue here lies in fact that the element parsed as text/html needs to read as valid HTML, which the top-level <tr> does not since it requires a <table> ancestor.

Here's the good news— XML is more accepting of "improper" HTML tags since it deals largely with custom tags for data sources. The main downside of XML would normally be that all the elements would not exist in an HTML hierarchy and that you would need one parent element. However, we can take advantage of this by creating the DOM tree the exact way you are wanting to in XML first and then pass all those elements to the new HTMLDocument using appendChild() and a for...of loop.

Here it is in action. I've added a function decorator to make this cleaner:

DOMParser.prototype.looseParseFromString = function(str) {
  str = str.replace(/ \/>/g, '>').replace(/(<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr).*?>)/g, '$1</$2>');
  const xdom = this.parseFromString('<xml>'+str+'</xml>', 'text/xml');
  const hdom = this.parseFromString('', 'text/html');
  for (elem of Array.from(xdom.documentElement.children)) {
    hdom.body.appendChild(elem);
  }
  for (elem of Array.from(hdom.querySelectorAll('area,base,br,col,command,embed,hr,img,input,keygen,link,meta,param,source,track,wbr'))) {
    elem.outerHTML = '<'+elem.outerHTML.slice(1).split('<')[0];
  }
  return hdom;
}

const parser = new DOMParser();
const domString = '<tr> <td>a</td> </tr> <div>X</div>  <div><img src="" />Test<br /></div>';
const dom = parser.looseParseFromString(domString);

// I added the below log() function to make the testing experience easier to digest by logging the contents of the output to the document *in addition to* the console, though I've hidden the console to save room. Any of thse should work just the same in your local console.

const printHTML = htmlContent => (typeof htmlContent === "string" ? htmlContent : htmlContent.outerHTML).replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;").replace(/'/g, "&#039;");

const log = (label, htmlContent, classStr) => (console.log(label, (typeof htmlContent === "string" ? htmlContent : htmlContent.outerHTML)), document.body.insertAdjacentHTML('beforeend', `<div class="log-entry${classStr?" "+classStr:""}" data-label="${label}"><pre>${printHTML(htmlContent)}</pre></div>`));

log("DOM string used for testing", domString, 'title')

log("dom.body", dom.body);
log("dom.querySelector('tr')", dom.querySelector('tr'));
log("dom.querySelector('div')",
dom.querySelector('div'));
log("dom.querySelector('br')", dom.querySelector('br'));

@import url(https://fonts.googleapis.com/css2?family=Source+Code+Pro:wght@700&display=swap);body{display:flex;flex-direction:column;font-family:'Source Code Pro',monospace;font-size:13px;font-weight:700;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.log-entry{display:flex;flex-direction:column;box-sizing:border-box}.log-entry+.log-entry{margin-top:8px}.log-entry::before,.log-entry>pre{padding:8px 16px}.log-entry::before{display:block;width:100%;background-color:#37474f;border-radius:10px 10px 0 0;content:attr(data-label);color:#eceff1;box-sizing:border-box}.log-entry>pre{display:block;margin:0;background-color:#cfd8dc;border-radius:0 0 10px 10px;color:#263238;white-space:break-spaces;box-sizing:border-box}.log-entry.title:first-of-type{position:sticky;top:0;margin:-8px -8px 4px -8px;box-shadow:0 0 30px 0 #263238}.log-entry.title::before{background:#000;border-radius:0;color:#3f3;text-shadow:0 2px 6px rgba(51,255,51,.5)}.log-entry.title>pre{padding-top:0;background:#000;border-radius:0;color:#fff}

The end result should do exactly what you're looking for here and all elements retain all HTML properties and methods, as does the final DOM work exactly as any other HTMLDocument object.

Moving forward, after initializing this prototype method, you would only need to use this one line to replace the line you mentioned in your original question:

parser.looseParseFromString('<tr> <td>a</td> </tr> <div>X</div>')

UPDATED (2021-05-04 20:22 GMT-0400)

UPDATES

1. I have updated my looseParseFromString() function to account for void HTML elements, which do not need a closing tag. I gathered this list of void tag names from this article by Lifewire. I worked around this issue by using a regex replacement to close any void tags and replace any XHTML-formatted void tag closures with simple HTML ones (e.g. <br /> ➞ <br></br>). Once the XMLDocument is successfully constructed, I loop through and create the HTMLDocument as I did before. After that, my function loops back through any of the void elements with closing tags in the new HTMLDocument, using the same list of void tag names from earlier, and removes the closing tags using the outerHTML property and the split() method.

2. I also implemented two helper functions log() and printHTML() which assist in simplifying the testing process by logging the results to the test window's document.body in addition to the console. I encourage you to test this code in your own console as well. It works the same across both for me.

Does this work for `

`? Although the HTML in above question is accidentally valid XML, the intention is to parse HTML. — guettli, May 04 '21 at 19:12
@guettli I guess that's the downside of using either function. One requires valid XML, and the other requires "valid" HTML and expects a `` above the ``. I can think of one other quicker workaround which should work for all cases. I'll work on it now… — Brandon McConnell, May 04 '21 at 20:52
@guettli When you have a chance, please check my recent update which accounts for these void HTML tags (e.g. `img`, `br`, `input`, etc.). We've just about resolved any edge cases here. — Brandon McConnell, May 05 '21 at 00:23
@guettli Yes, I have. When I use `parseFromString()` and wrap the string in a ` — Brandon McConnell, May 05 '21 at 09:50

Cerbrus · Answer 2 · 2021-04-29T08:30:47.993

0

The parseFromString() method of the DOMParser interface parses a string containing either HTML or XML, returning an HTMLDocument or an XMLDocument.
_{(source, emphasis mine)}

You're using parseFromString() to do something it's not meant to do.

What you probably want to do instead, is:

Create a temporary container element.
Dump your HTML into that element.
Get the container's children.

Either way, you're not gonna get a tr DOM node out of this, as the DOM parser seems to strip out invalid HTML (Orphaned tr nodes are invalid)

edited Apr 29 '21 at 08:30

answered Apr 29 '21 at 08:17

Cerbrus

70,800
18
132
147

In case you meant [like this](https://jsfiddle.net/kpnor50d/), that won't work either (I guess because it uses the same parser and again removes the invalid HTML) – Apr 29 '21 at 08:28
Oh, that's pretty neat! I've updated my answer to state the impossibility. Thanks, @ChrisG – Cerbrus Apr 29 '21 at 08:31
All the "What you probably want to do instead, is:" paragraph is moot, using an HTMLElement is just dirtier and slower, it doesn't offer anything good. – Kaiido Apr 29 '21 at 08:37
@Kaiido: It doesn't build a HTMLDocument around it. – Cerbrus Apr 29 '21 at 08:42

score 0 · Answer 3 · answered Apr 29 '21 at 12:43

0

Unfortunately table elements (and a few other elements) will not parse as top level elements.

We handle this in htmx by wrapping and then unwrapping them:

https://github.com/bigskysoftware/htmx/blob/665fc4bda76f97c0a023f96e65fba3527fec6a3b/src/htmx.js#L159

answered Apr 29 '21 at 12:43

1cg

1,392
3
8

The way [htmx](//htmx.org) implements this makes sense. – guettli Apr 29 '21 at 19:31

score -1 · Answer 4 · answered Apr 30 '21 at 06:54

-1

If you wrap the html fragment in a <template> tag before using parseFromString() it should work:

answered Apr 30 '21 at 06:54

guettli

25,042
81
346
663

Make parseFromString() parse without validation

Update

4 Answers4

UPDATED (2021-05-04 20:22 GMT-0400)

Linked