2

So I have a list of about 1,000 words, if they appear on the page they get replaced with something. I tried doing it with regular expressions, so for each of the thousand words I replace the content like this:

    var pattern = new RegExp("(.*?)([^A-Za-z_/\-]+)("+title+")([^A-Za-z_\./\-]+)(.*?)","ig");

    content = content.replace( pattern, function replacer(contents,start,before,value,after,end) {

    var key = value.toLowerCase();

but this method turns out to be really slow. Another method would be to split the page content into words and then check to see if any of the parts are equal to any of the keywords. The problem there is I have a thousand keywords, so on a page with 10,000 words, I'd have to loop through 10,000 X 1,000 items which would probably crash the browser.

Does anyone know of a good way to substitute words on a page?

John Slotsky
  • 171
  • 1
  • 2
  • 12
  • At least part of it will relate to [this answer](http://stackoverflow.com/questions/5904914/javascript-regex-to-replace-text-not-in-html-attributes/5904945#5904945), the code of which was used to create the [Drumfinator](http://drumpfinator.com/) Chrome plug-in. :-) – T.J. Crowder Jun 19 '16 at 17:18
  • "The problem there is I have a thousand keywords" It is not a problem if you can prebuilt `{keyword: 'value'}` hash the whole operation would be `O(n)` where `n` is the number of words in text. – Yury Tarabanko Jun 19 '16 at 17:19
  • Assuming not each word is unique, you can index the words, take the unique values, and compare the first 3,5, or 7 letters and replace the values, and then rebuilt the string. – Nitin Jun 19 '16 at 18:45

1 Answers1

1

This is slow because for each word, you're testing the whole content again. Better to make a regex for any word, then look that up in a hash:

// Make your "dictionary" first:
var replacements = {
    "replace": "R",
    "this": "T",
    "etc": "..."
};

var content = "Should replace this with letters.";

var output = content.replace(/\w+/g, function replacer(word) {
    return replacements[word.toLowerCase()] || word;
});

console.log(output);

The output is:

Should R T with letters.
Martin Stone
  • 12,682
  • 2
  • 39
  • 53
  • Edit: removed \b stuff: It's not necessary as \w+ will match words from beginning to end anyway. Nor is the capture group. – Martin Stone Jun 19 '16 at 20:36
  • That works! One thing I need to figure out now, is how can I match only words that don't appear inside things like

    or tags?

    – John Slotsky Jun 20 '16 at 21:30
  • To do that you probably have to traverse the DOM and process the text one node at a time (unless it's a node that shouldn't be processed). – Martin Stone Jun 21 '16 at 16:53