RegEx/JS: Wrap innerHTML text elements in span including nested HTML elements

Question

Given the innerHTML of an element, I'm trying to wrap each word in a span, and if the word is already wrapped, wrap the element.

e.g. Such that Here is a paragraph <span class="red">which</span> <div id="typed-effect">might</div> have nested elements

Becomes <span>Here</span> <span>is</span> <span>a</span> <span>paragraph</span> <span class="red">which</span> <span><div id="typed-effect">might</div></span> <span>have </span> <span>nested</span> <span>elements</span>

The components I have are:

Extract the innerHTML and wrap in spans:

function wrapElementText(elementSelector) {
   const element = document.querySelector(elementSelector);
   const rawHTML = element.innerHTML;
   // Do magic to create list
   element.innerHTML = "";
   list.forEach((i, item) => {
     if (!item.contains(`<span`)) {
        element.innerHTML += `<span class="wrapped-text">${item}</span>`
      }
   }
};

My best attempt is this ((\<.*?\>)|((\s?).*?(\s))) but it's returning e.g. <div id="typed-effect"> and might</div> as seperate groups when it should be one group <div id="typed-effect">might</div>.

Thanks so much in advance!

If you are using the DOM API anyway, why don't you check for child elements avoiding the check for nested elements altogether? Regexen are ill-suited to handle pure text portions of semi-structured data from the perspective of robust, mainatinable code, though technically lots can be done with today's regex engines as they are algorithmically more powerful than needed to handle the abstract notion ((ie. the formal language)) of regular expressions . — collapsar, Jun 24 '22 at 09:41
[don't use regex for html](https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) — pilchard, Jun 24 '22 at 09:45
also i would like to say that you shouldn't wrap div inside a p element, js won't recognize the childNodes from the div to the end, it's just a bad practice — eroironico, Jun 24 '22 at 10:03

eroironico · Accepted Answer · 2022-06-24T11:39:34.440

Ok so i came up with these solutions:

the regex approach(not recommended but what you asked for):

function wrapElementTextRegexp(elementSelector) {
    const element = document.querySelector(elementSelector);
    const chunks = element.innerHTML.match(/[a-zA-Z]+|(<[a-z][^>]+>.+<\/[^>]+>)/g);
    element.innerHTML = "";
    for(const chunk of chunks) {
        element.innerHTML += !chunk.startsWith('<span')
            ? `<span class="wrapped-text">${chunk}</span>`
            : chunk
    }
}

Here, the regexp is searching for a whole text or for a html tag pattern(simplified), in case it finds something that is not a span it wraps it in a span, otherwise it just return the original span

the childNodes approach(imo better):

function wrapElementTextNodes(elementSelector) {
    const element = document.querySelector(elementSelector);
    const parsedInnerHTML = [...element.childNodes]
        .map(node => node instanceof HTMLElement
            ? node instanceof HTMLSpanElement
                ? node.outerHTML
                : `<span class="wrapped-text">${node.outerHTML}</span>`
            : node.textContent
                  .trim()
                  .split(/\s+/)
                  .map(word => `<span class="wrapped-text">${word}</span>`))
        .flat()
        .filter(chunk => !chunk.match(/><\//g))
        .join('')
  element.innerHTML = parsedInnerHTML;
}

Here we have access to the instances and we don't have to split the innerHTML since we're using childNodes, in the example i mapped the nodes by checking if the node was an actual HTMLElement and if yes if it was a HTMLSpanElement. After the first map i used a filter to remove from the array every empty node generated after the map(filtering out nodes like <span class="wrapped-text"></span>) and finally i join() the elements.

IMPORTANT NOTE wrapping a <div> element inside a <p> it's a bad practice, javascript won't recognize any childNode after the div(included) so for example if you have

<p>
    some
    <div>bad</div>
    example
</p>

the div and the node example will be omitted in element.innerHTML that, in this case, will return some. So make sure you correct your markup

Thanks so much, great answer. The childNodes approach makes a lot more sense, only minor issue with it is that it is wrapping the block of text, rather than each individual word like the Regex does. `
Here is a paragraph which have nested elements
` - see https://jsfiddle.net/cpq4jryo/17/ would you just `split(" ")` on `node.textContent` and loop through for each span? — o1n3n21, Jun 24 '22 at 11:23
yep sorry you're absolutely right, but instead of splitting with `.split(" ")` you can use a regex to split for one or more spaces so you don't get something like `['some', '', 'word']`, you can do this by calling `.split(/\s+/)` — eroironico, Jun 24 '22 at 11:36

RegEx/JS: Wrap innerHTML text elements in span including nested HTML elements

1 Answers1