0

I'm trying to replace all the occurrence of a given word in a string but it is possible that the word contains a special character that needs to be escaped. Here's an example:

The ERA is the mean of earned runs given up by a pitcher per nine innings pitched. Meanwhile, the ERA+, the adjusted ERA, is a pitcher's earned run average (ERA) according to the pitcher's ballpark (in case the ballpark favors batters or pitchers) and the ERA of the pitcher's league.

I would like to be able to do the following:

string = "The ERA..." // from above
string = string.replaceAll("ERA", "<b>ERA</b>");
string = string.replaceAll("ERA+", "<u>ERA+</u>");

without ERA and ERA conflicting. I've been using the protoype replaceAll posted previously along with a regular expression found somewhere else on SO (I can't seem to find the link in my history unfortunately)

String.prototype.replaceAll = function (find, replace) {
    var str = this;
    return str.replace(new RegExp(find.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 'g'),     replace);
};

function loadfunc() {
    var markup = document.getElementById('thetext').innerHTML;
    var terms = Object.keys(acronyms);
    for (i=0; i<terms.length; i++) {
        markup = markup.replaceAll(terms[i], '<abbr title=\"' + acronyms[terms[i]] + '\">' + terms[i] + '</abbr>');
    }
    document.getElementById('thetext').innerHTML = markup;
}

Basically what the code does is adding an tag to abbreviation to include the definition when mouseovering. The problem is that the current regular expression is way too loose. My previous attempts worked partially but failed to make the difference between things like ERA and ERA+ or would completely skip over something like "K/9" or "IP/GS" (which should be a match by itself and not for "IP" or "GS" individually)

I should mention that acronyms is an array that looks like:

var acronyms = {
    "ERA": "Earned Run Average: ...",
    "ERA+": "Earned Run Average adjusted to ..."
};

Also (although this is fairly obvious) 'thetext' is a dummy div containing some text. The loadfunc() function is executed from <body onload="loadfunc()">

Thanks!

Community
  • 1
  • 1
mast
  • 167
  • 1
  • 10
  • What criteria are you using to identify what you're searching for? ERA is one example, can you provide some others (other than K/9, IP/GS, etc.)? – brandonscript Dec 05 '13 at 23:36
  • @r3mus It's mostly 1, 2 or 3 letters, always in capitals (A, K, AB) with possibly a '/' (such as IP/GS) or a trailing '+' or '%' (such as ERA+ or SB%). I've uploaded the entire acronyms array on [jsfiddle](http://jsfiddle.net/KAYkj/) if you want to have a quick look. – mast Dec 05 '13 at 23:41

2 Answers2

1

OK, this is a lot to work with -- after looking at your jsFiddle.

I think the best you're going to get is searching for whole words that begin with a capital letter and may contain / or %. Something like this: ([A-Z][\w/%]+)

Caveat: no matter how you do this, if you're doing it in the browser (e.g. you can't update the raw data) it's going to be process intensive.

And you can implement it like this:

var repl = str.replace(/([A-Z][\w\/%]+)/g, function(match) {
    //alert(match);
    if (match in acronyms)
        return "<abbr title='" + acronyms[match] + "'>" + match + "</abbr>";
    else
        return match;
});

Here's a working jsFiddle: http://jsfiddle.net/remus/9z6fg/

Note that jQuery isn't required, just used it in this case for ease of updating the DOM in jsFiddle.

brandonscript
  • 68,675
  • 32
  • 163
  • 220
  • Works great! It doesn't seem to pick up single letters or entries of the form 0A (such as 1B and so on). I'll try and fix that, Thanks! – mast Dec 06 '13 at 00:54
  • It's enforcing a search for capital A-Z at the beginning; replace `[A-Z]` with `[A-Z0-9]` if you need to capture a number too. – brandonscript Dec 06 '13 at 00:56
0

You want to use regular expressions with negative lookahead:

string.replace(/\bERA(?!\+)\b/g, "<b>ERA</b>");

and

string.replace(/\bERA\+/g, "<u>ERA+</u>");

The zero-width word boundary \b has been added for good measure, so you don't accidentally match strings like 'BERA', etc.

Another idea is to sort the list of acronyms by longest key to smallest. This way you are sure to substitute all 'ERA+' before 'ERA', so there is no substring conflict.

Matt
  • 20,108
  • 1
  • 57
  • 70
  • Thanks for the reply! Two questions: #1, it does not seem to work for the second case (ERA+) as would suggest `"ERA is something ans so is ERA+".replace(/\bERA+/g, "YES");` shows up as "YES is something and so is YES+". The first one works marvellously though! #2. I'm not sure how to build those expressions from the array: `string.replace('\b' + arr[i] + '/g', 'YES');` does not seem to work. Would I also have to check if the token to replace contains a + or a / beforehand as well? Thanks a lot! – mast Dec 06 '13 at 00:00
  • @Alex Oops, forgot that `+` is a special character that needs to be escaped. Fixed. – Matt Dec 06 '13 at 00:03
  • Edited my comment before seeing your reply, that fixed #1. Thanks! – mast Dec 06 '13 at 00:05