Replace all instances of multiple words on the entire page

Question

I have several different words I need to replace with different words using Javascript. I thought I could achieve this using the replace function, but it seems to only work on the first call.

However, since the words I need to replace are similar to one another, this is causing some kind of issue and I'm not sure how to fix it. I also really need to do this with Javascript and not jQuery. Thanks.

Example:

function color() {
document.body.innerHTML = document.body.innerHTML.replace(/A/g, "<span style='color:#FF0000;'>A</span>");
document.body.innerHTML = document.body.innerHTML.replace(/A1/g, "<span style='color:#0000FF;'>A1</span>");
}

<html>
  <body>
    <h3>A</h3>
    <h3>A1</h3>
    <button onclick="color()">click</button>
  </body>
</html>

Welcome to SO! Can you clarify what your expected output is? Your goal is to wrap certain words in the document with a span? This approach of taking the whole `innerHTML` of the document and running it all through a regex is an antipattern. It's better to traverse the DOM and perform the replacements, but hard to show you how to do this without knowing what you want to achieve. For example, if you only want to perform this swap on a single element, that's a lot easier than the entire document. Thanks for the info. — ggorlen, Apr 13 '20 at 23:27

ggorlen · Accepted Answer · 2020-04-14T16:03:19.517

2

It's a mistake to do a body.innerHTML replacement under almost all circumstances. It may appear to work on some trivial examples, but for any non-trivial application, it will fail. See don't parse HTML with regex.

To get a taste of why this approach is a non-starter, consider if your document has the following element:

<div style="color: #AA144"></div>

How is your regex going to know not to slap a bunch of spans inside this attribute string for every A or A1? It won't.

Instead, use the document object model which is a tree structure representing the markup (all the HTML parsing was done for you!). Traverse the nodes in the tree and operate on each node's textContent, performing the replacements:

for (const parent of document.querySelectorAll("body *")) {
  for (const child of parent.childNodes) {
    if (child.nodeType === Node.TEXT_NODE) {
      const pattern = /(A1|A)/g;
      const replacement = "<span style='color:#FF0000;'>$1</span>";
      const subNode = document.createElement("span");
      subNode.innerHTML = child.textContent.replace(pattern, replacement);
      parent.insertBefore(subNode, child);
      parent.removeChild(child);
    }
  }
}

body {
  background: white;
}

<div>
  foobar
  <div>
    <div style="color: #AA144">
      foobazz A1
    </div>
    foo quuz AA
    <h4>coraaAge</h4>
    <p>
      A bA bAA ello world A1
    </p>
  </div>
</div>

If you want to do different colors for different patterns, you can use:

const swaps = {
  foo: "#f00",
  bar: "#b42"
};
const pattern = RegExp(Object.keys(swaps).join("|"), "g");
const sub = m => `<span style='color:${swaps[m]};'>${m}</span>`;

for (const parent of document.querySelectorAll("body *")) {
  for (const child of parent.childNodes) {
    if (child.nodeType === Node.TEXT_NODE) {
      const subNode = document.createElement("span");
      subNode.innerHTML = child.textContent.replace(pattern, sub);
      parent.insertBefore(subNode, child);
      parent.removeChild(child);
    }
  }
}

body {
  background: white;
}

<div>
  foobazbarfoooo
  <div>
    <div style="color: #AA144">
      foobazz ybarybar
    </div>
    foo quuz bar
    <h4>corge foo</h4>
    <p>
      foo bar baz
    </p>
  </div>
</div>

Note that the order of concatenation groups in the above regexes matter. (A|A1) will not work because the A option will be matched first and a 1 character after it will fail to highlight.

Also, it should go without saying that performing string replacement on a tree of nodes like this is an antipattern, even if it's much more workable than a giant .innerHTML call. This approach is still prone to serious performance and accuracy issues. In most cases, it's best to represent the data from the start using an in-memory, application-specific data structure, then format the data and generate HTML to avoid expensive and brittle re-parsing HTML.

I adapted code from another answer of mine for this. The question is not really a dupe, though.

edited Apr 14 '20 at 16:03

answered Apr 13 '20 at 23:48

ggorlen

44,755
7
76
106

That's a very good point I hadn't considered, and this solution works *very* well! Thank you! One last question, how would I format this to work with different colors? Say I want to make all instances of 'A' red, and 'B' blue? – leftbones Apr 13 '20 at 23:55
Good point regarding `Don't parse HTML with regex.` Funny enough this still has the same problem as the OP question. Please take a look into my answer (that was downvoted) and see the considerations after the code snippet. – a--m Apr 14 '20 at 00:03
@eprkr No problem. It's frowned upon to "move the goalpost"--it's best to open new questions for new problems, but I updated my answer with a possible solution anyway. – ggorlen Apr 14 '20 at 00:08
1

@a--m Yes, that's another issue. Luckily we can rely on a concatenated regex to handle all of that automatically since it won't overlap matches or backtrack after picking a match. The logic is sort of use-case dependent, but it's a different problem class than the parsing HTML with regex problem. – ggorlen Apr 14 '20 at 00:09
@ggorlen Good to know. I figured I didn't want to spam new questions just because of new issues but I will keep that in mind for the future. I was having an issue replacing both F2 and F but I realized the issue was that I was checking for F and then F2, rather than the other way around. Got it working now. Thanks so much! – leftbones Apr 14 '20 at 00:14
See my comment in the other thread. You probably shouldn't be using this as a solution to whatever problem you're trying to solve. This is given as a solution to your question as you posed it, not whatever problem motivated the question. I haven't looked at your page or actual use case, but basically it strikes me as a kludge to make up for a likely problematic design. So you can expect to be chasing after edge cases. Having said that, I didn't have any problems with `A1` and `A2`, so you'll have to point me to the failing example more clearly. – ggorlen Apr 14 '20 at 00:16
In other words, I hope it goes without saying that it's not a good pattern to traverse the entire document and do text replacement on every node. In the best case, the performance is awful, and in the worst case it causes all kinds of bugs. The better approach is to start with the data in memory in JS and then build the DOM from that structure. – ggorlen Apr 14 '20 at 00:23
As for `A1` and `A`, the ordering of the concatenated alternation group in the regex matters. Make sure it's `(A1|A)`, not the other way around, otherwise the `A` will be matched and `1` will fail to highlight. – ggorlen Apr 14 '20 at 00:25
I think the other thread got deleted. Yeah, I'm realizing this is a pretty ungraceful solution, but it was a good learning exercise. I can figure out something else from there I suppose. – leftbones Apr 14 '20 at 00:25
You can open up a new question showing a minimal example of what you're trying to achieve, with context, and mention that you tried a solution like this but it's failing on cases X and Y or what have you. – ggorlen Apr 14 '20 at 00:27

Replace all instances of multiple words on the entire page

1 Answers1

Linked

Related