Replacing abbreviations with regex

Question

I have strings in JavaScript that may contain abbreviations in it. I need a regular expression that will replace these abbreviations reliably. I am not very good at regular expressions and so I need some help. Here is a simple example:

var string1="Home in the USA";
var string2="SOME USABILITY...";
var string3="The USA is home";
string1.replace(/USA/,"United States of America")

With the three possible strings, I want to replace "USA" with "United States of America" but I don't want it to touch the second string since it's obviously a different word. So I need a regex that would replace the matching abbreviation only if the following character is a white-space or nothing. Any help would be appreciated.

Check out [word boundaries](http://www.regular-expressions.info/wordboundaries.html) using `\b` — Stephen P, Jun 13 '17 at 20:37

score 1 · Accepted Answer · answered Jun 13 '17 at 20:41

You need to use word boundaries for this. The simple regex would be: /\bUSA\b/g

This says that there must be a word boundary before or after USA. Another thing to note that this is a GLOBAL regex, therefore it will replace every occurrence of "USA" with word boundaries, not just the first. Check out this regexer:

http://regexr.com/3g5hs

samanime · Answer 2 · 2017-06-13T20:52:21.047

TL;DR: /(?:\s)USA(?:\s)/, but check out the more sophisticated function at the bottom.

If you want to check if the following character is a space, you just add a lookahead, which looks like this:

const strs = [
  'USA is a country',
  'They say USA there',
  'We are in the USA',
  'SOME USABILITY'
];

const pattern = /USA(?:\s)/;
const replacement = 'United States of America ';

console.log(strs.map(str => str.replace(pattern, replacement)));

Note two things:

Just checking ahead won't work if the word is at the end.
In the replace() function, it'll replace the whole pattern, so you'll need to add the space back to your replacement.

If you want to look at both sides, it's pretty much the same thing:

const strs = [
  'USA is a country',
  'They say USA there',
  'We are in the USA',
  'SOME USABILITY'
];

const pattern = /(?:\s)USA(?:\s)/;
const replacement = ' United States of America ';

console.log(strs.map(str => str.replace(pattern, replacement)));

If you want to handle everywhere, you'll also want to add in a check for beginning or end of the string:

const strs = [
  'USA is a country',
  'They say USA there',
  'We are in the USA',
  'SOME USABILITY'
];

const pattern = /(?:\s|^)USA(?:\s|$)/;
const replacement = ' United States of America ';

console.log(strs.map(str => str.replace(pattern, replacement).trim()));

Note in this case, we also trim the extra stuff out.

A slightly cleaner method so you don't have to worry about the extra spaces would be to do things in a few steps:

const strs = [
  'USA is a country',
  'They say USA there',
  'We are in the USA',
  'SOME USABILITY'
];

const target = 'USA';
const replacement = 'United States of America';

const replaceWord = (str, word, replacement) => {
  const pattern = new RegExp(`(?:[^a-zA-Z-]|^)(${target})(?:[^a-zA-Z-]|$)`, 'g');
  return (str.match(pattern) || [])
    .reduce((result, match) => result.replace(match, match.replace(word, replacement)), str);
 };

console.log(strs.map(str => replaceWord(str, target, replacement)));

This is a little more sophisticated of a solution. First, I updated the pattern to not look for just spaces, but anything non-alphabetic (to account for words bumping against things like commas and periods).

Our actually replacement first gets all of the matches (with the extra checks). We then look through it, and for each match, you replace just the original target, then use that whole bit to replace the whole match from the previous.

This is much more flexible.

I also build the pattern as a variable, so you'd be able to replace any word.

You're missing the point of word boundaries. Your fairly ugly (?:\s|^) and (?:\s|$) non-capturing capture groups can simply be replaced with \b (word boundary). Also, you don't have your regex being global so your replace would only replace the first occurrence. Sorry, had to downvote. — jas7457, Jun 13 '17 at 20:44

Replacing abbreviations with regex

2 Answers2