0

Is there a better way to get just the tag information (tagName, classes, styles, other attributes, whether it is empty or not, etc.) without the innerHTML content, with starting and ending tag separated than:

const outer = el.outerHTML
const inner = el.innerHTML
const tag_only = outer.replace(inner, '');
const MATCH_END = /^<([a-zA-Z][a-zA-Z0-9_-]*)\b[^>]*>(<\/\1>)$/;
const match = MATCH_END.exec(tag_only);
if (match === null) {  // empty tag, like <input>
    return [tag_only, inner, ''];
} else {
    const end_tag = match[2];
    const start_tag = tag_only.replace(end_tag, '');
    return [start_tag, inner, end_tag];
}

This works, but it does not seem particularly efficient, requiring two calls to query the DOM, two replace calls, and a regular expression search (ugg) to get back some information that the browser/DOM already has separately.

(FWIW, I'm working on an Element/Node processor that needs to walk all childNodes, changing some, before reconstructing mostly the original HTML, so I'm going to need to recursively call this function a lot and it would be good for speed to have a faster way)

  • Possible duplicate https://stackoverflow.com/questions/37751950/javascript-element-html-without-children – T J Mar 13 '21 at 21:08
  • 2
    "tagName [w/ original case]" HTML is case insensitive, so except for foreign objects (like svg), there is no way you can grab the "original case" from an already parsed document. The only way would be to fetch the file as text and parse it yourself. – Kaiido Mar 14 '21 at 05:39
  • 1
    getting the document tree from file could work in the Netscape era, nowadays the chances are close to `0`, in many cases there will be only a `
    ` in the `body`
    – n-- Mar 14 '21 at 11:15
  • The proposed duplicate only proposes to find the start tag not the end tag also. – Michael Scott Asato Cuthbert Mar 15 '21 at 10:37
  • Thanks @Kaiido. Case wasn't that important here, so removed from the question. – Michael Scott Asato Cuthbert Apr 02 '21 at 22:00

1 Answers1

1

methods like innerHTML, outerHTML are expensive since they parse the whole element tree on which they are called, building the DOM tree like this is exponentially expensive, so they should be avoided in performant applications, in fact a seemingly okay childNodes is expensive too, so for maximum performance you shoud build the tree node-by-node. Below is a possible solution for your case:

const collect = function (el) {
    const inner = [];
    if (el && (el.nodeType === Node.ELEMENT_NODE
    || el.nodeType === Node.TEXT_NODE)) {
        let clone = el.cloneNode();
        clone.setAttribute?.('data-clone', clone.tagName);
        let tag_only = clone.outerHTML;
        let elm;
        const MATCH_END = /^<([a-zA-Z][a-zA-Z0-9_-]*)\b[^>]*>(<\/\1>)$/;
        const match = MATCH_END.exec(tag_only);
        if (match === null) {  // empty tag, like <input>
            elm = [tag_only, inner, ''];
        } else {
            const end_tag = match[2];
            const start_tag = tag_only.replace(end_tag, '');
            elm = [start_tag, inner, end_tag];
        }
        this.push(elm);
    }
    el = el.firstChild;
    while (el) {
        collect.call(inner, el);
        el = el.nextSibling;
    }
    return this;
};

console.log(collect.call([], document.body).flat(Infinity).join(''));
<div data-id="a" class="b">
  <input type="text">
  <div data-id="c" class="d">
    <input type="text"/>
    <div data-id="e" class="f">
      <input type="text"/>
    </div>
  </div>
</div>
n--
  • 3,563
  • 4
  • 9