1

How can I replace the text on a webpage, including text that is injected or modified with future JavaScript calls? All the answers in replace words in the body text only work on text that is on the page at the moment of execution.

Daniel Ting
  • 83
  • 1
  • 7

1 Answers1

1

It turns out that doing the above in a performant way and without breaking anything is nontrivial for the supposedly declarative markup language that is HTML. I have documented what I learned over a month of testing and experimenting below.

To do an initial round of replacement on existing text, we will leverage TreeWalker to go through every Text node in the document and process their contents. In this example, I will be censoring "heck" with "h*ck".

const callback = text => text.replaceAll(/heck/gi, 'h*ck');

function processNodes(root) {
    const nodes = document.createTreeWalker(
        root, NodeFilter.SHOW_TEXT, { acceptNode:
        node => valid(node) ? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT
    });
    while (nodes.nextNode()) {
        nodes.currentNode.nodeValue = callback(nodes.currentNode.nodeValue);
    }
}

function valid(node) {
    return (
        node.parentNode !== null
        && node.parentNode.tagName !== 'SCRIPT'
        && node.parentNode.tagName !== 'STYLE'
        && !node.parentNode.isContentEditable
    );
}

processNodes(document.body);

Note the valid function. This is to handle three exceptional cases:

  1. We need to check that the parent node exists as sometimes the node will get removed from the document by the time we get around to it
  2. Messing with <script> and <style> tags could break functionality or presentation
  3. Editing a contenteditable element resets the cursor position which is a terrible user experience

But that only takes care of text that was already on the page. To watch for future changes, we can use MutationObserver to watch for added or modified text nodes.

const IGNORED = [
    Node.CDATA_SECTION_NODE,
    Node.PROCESSING_INSTRUCTION_NODE,
    Node.COMMENT_NODE,
];
const CONFIG = {subtree: true, childList: true, characterData: true};

const observer = new MutationObserver((mutations, observer) => {    
    observer.disconnect();
    for (const mutation of mutations) {
        const target = mutation.target;
        switch (mutation.type) {
            case 'childList':
                for (const node of mutation.addedNodes) {
                    if (node.nodeType === Node.TEXT_NODE) {
                        if (valid(node)) {
                            node.nodeValue = callback(node.nodeValue);
                        }
                    } else if (!IGNORED.includes(node.nodeType)) {
                        processNodes(node);
                    }
                }
                break;
            case 'characterData':
                if (!IGNORED.includes(target.nodeType) && valid(target)) {
                    target.nodeValue = callback(target.nodeValue);
                }
                break;
        }
    }
    observer.observe(document.body, CONFIG);
});
observer.observe(document.body, CONFIG);

The observer's callback consists of two main parts: a case for childList that processes any new subtrees and text nodes as well as a case for characterData that handles text nodes that had their contents changed. We must turn off the observer before making any edits of our own to avoid triggering an infinite loop. Also note the IGNORED array; this is necessary because certain nodes fall under the Text interface but are not front-facing, user-visible content.

Putting those two pieces together should be enough 98% percent of the time. However, there are still many special cases we didn't consider:

A proper explanation of workarounds for the above wouldn't fit in a StackOverflow answer, but I have written a free library called TextObserver that does it for you.

Daniel Ting
  • 83
  • 1
  • 7