How can I change all instances of a string within an HTML document via JS without disturbing its markup?

Question

I want to write a script that will cause a page to "decay", changing characters at random. Let's say I've got some HTML that looks like this:

<div class="eebee">
    Lorem Ipsum <a href="http://example.com">Anchor here</a>
</div>

And I want to replace every instance of "e" with "∑" so it'll be

<div class="eebee">
    Lor∑m Ipsum <a href="http://example.com">Anchor h∑r∑</a>
</div>

but, obviously, I don't want it to be

<div class="∑∑b∑∑">
    Lor∑m Ipsum <a hr∑f="http://∑xample.com">Anchor h∑r∑</a>
</div>

How to accomplish this? A DOM parser? Or just some regex, searching for content betw∑∑n ">" and "<"?

EDIT: As per Oriol's solution below, to put it into a function that accepts any find-and-replace strings:

function decay(find_string, replace_string) {
    var treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
    while(treeWalker.nextNode()) {
        var node = treeWalker.currentNode;
        re = new RegExp(find_string, "g");
        node.nodeValue = node.nodeValue.replace(re, replace_string);
    }
}

score 3 · Accepted Answer · edited May 23 '17 at 12:16

You can write a simple recursive function which iterates all text nodes:

function iterateTextNodes(root, callback) {
  for(var i=0; i<root.childNodes.length; ++i) {
    var child = root.childNodes[i];
    if(child.nodeType === 1) {           // element node
      iterateTextNodes(child, callback); // recursive call
    } else if(child.nodeType === 3) {    // text node
      callback(child);                   // pass it to callback
    }
  }
}
iterateTextNodes(document.body, function(node) {
  node.nodeValue = node.nodeValue.replace(/e/g, '∑');
});

<div class="eebee">Lorem Ipsum <a href="http://example.com">Anchor here</a></div>

Or if you prefer a built-in way, you can use a tree walker

var treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
while(treeWalker.nextNode()) {
  var node = treeWalker.currentNode;
  node.nodeValue = node.nodeValue.replace(/e/g, '∑');
}

<div class="eebee">Lorem Ipsum <a href="http://example.com">Anchor here</a></div>

Some notes:

Don't use a HTML parser. If you parse a HTML string and then replace the old DOM tree with the new one, you will remove all the internal data of elements (event listeners, checkedness, ...)
Especially, never use a regex to parse HTML. You can't parse (X)HTML with regex.

Holy crap, that's so easy! And I thought I knew Javascript. – Richard Maneuv May 02 '16 at 03:00 — Richard Maneuv, May 02 '16 at 03:00

How can I change all instances of a string within an HTML document via JS without disturbing its markup?

1 Answers1