Javascript - How to replace words inside page content

Question

So I have a list of about 1,000 words, if they appear on the page they get replaced with something. I tried doing it with regular expressions, so for each of the thousand words I replace the content like this:

    var pattern = new RegExp("(.*?)([^A-Za-z_/\-]+)("+title+")([^A-Za-z_\./\-]+)(.*?)","ig");

    content = content.replace( pattern, function replacer(contents,start,before,value,after,end) {

    var key = value.toLowerCase();

but this method turns out to be really slow. Another method would be to split the page content into words and then check to see if any of the parts are equal to any of the keywords. The problem there is I have a thousand keywords, so on a page with 10,000 words, I'd have to loop through 10,000 X 1,000 items which would probably crash the browser.

Does anyone know of a good way to substitute words on a page?

At least part of it will relate to [this answer](http://stackoverflow.com/questions/5904914/javascript-regex-to-replace-text-not-in-html-attributes/5904945#5904945), the code of which was used to create the [Drumfinator](http://drumpfinator.com/) Chrome plug-in. :-) — T.J. Crowder, Jun 19 '16 at 17:18
"The problem there is I have a thousand keywords" It is not a problem if you can prebuilt `{keyword: 'value'}` hash the whole operation would be `O(n)` where `n` is the number of words in text. — Yury Tarabanko, Jun 19 '16 at 17:19
Assuming not each word is unique, you can index the words, take the unique values, and compare the first 3,5, or 7 letters and replace the values, and then rebuilt the string. — Nitin, Jun 19 '16 at 18:45

Martin Stone · Accepted Answer · 2016-06-19T20:35:01.273

1

This is slow because for each word, you're testing the whole content again. Better to make a regex for any word, then look that up in a hash:

// Make your "dictionary" first:
var replacements = {
    "replace": "R",
    "this": "T",
    "etc": "..."
};

var content = "Should replace this with letters.";

var output = content.replace(/\w+/g, function replacer(word) {
    return replacements[word.toLowerCase()] || word;
});

console.log(output);

The output is:

Should R T with letters.

edited Jun 19 '16 at 20:35

answered Jun 19 '16 at 18:40

Martin Stone

12,682
2
39
53

Edit: removed \b stuff: It's not necessary as \w+ will match words from beginning to end anyway. Nor is the capture group. – Martin Stone Jun 19 '16 at 20:36
That works! One thing I need to figure out now, is how can I match only words that don't appear inside things like
or tags?
– John Slotsky Jun 20 '16 at 21:30
To do that you probably have to traverse the DOM and process the text one node at a time (unless it's a node that shouldn't be processed). – Martin Stone Jun 21 '16 at 16:53

Javascript - How to replace words inside page content

1 Answers1

or tags?