1

I have an array of phrases, and am trying to detect if a string of text contains a full phrase. I currently am using the following regex:

var arrOfWords = ['foo', 'bar', 'foo bar']
var regEx = new RegExp('\\b(' + arrOfWords.join('|') + ')\\b', 'gi')

console.log(regEx)
/\b(foo|bar|foo bar)\b/gi

I used \b because I didn't want to include substrings, but rather the complete word/phrase, i.e. "foo" should not match with "foobar", but should match "I like foo"

This works great, however, word boundaries, \b, ignore phrases that begin with #, as \b starts the boundary at alphanumeric characters.

So if "#hashtag" is in the array, it will only match if the string being tested has "hashtag", not "#hashtag"

What I'm really looking for would be a regex that matches the entire phrase as specified in the array, including symbols and hashes. Or maybe a solution that can work around this.

Can anyone point me in the right direction? Thanks.

Patrick E.
  • 165
  • 1
  • 2
  • 9

1 Answers1

1

Unfortunately, JS doesn't have lookbehind, so it is impossible to match on properties of previous character without including that character into the match (except by \b, which is, as you note, of very limited use). If this is acceptable to you, you can have:

/(?:^|\W)(foo|bar|foo bar|#hashtag)(?=$|\W)/

and only deal with the first capture group. This is guaranteed to not overlap if you only want full words/phrases, as there is guaranteed to be a non-word separator.

NB: if arrOfWords contains strings with regexp-meaningful characters, they will be interpreted as such; so foo.bar will be matching foosbar. Refer here on how to avoid it.

Hey, this is 90% there for me, thanks. Just to nitpick, I noticed that if arrOfWords contains #hashtag, it will match with ##hashtag in the string. Is there a way of matching only if the number of hashes is exact?

Then you need to be explicit about what's a word and what's a non-word character, and replace \W with that.

/(?:^|[^\w#'-])(foo|bar|foo bar|#hashtag)(?=$|[^\w#'-])/
Community
  • 1
  • 1
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Hey, this is 90% there for me, thanks. Just to nitpick, I noticed that if `arrOfWords` contains `#hashtag`, it will match with `##hashtag` in the string. Is there a way of matching only if the number of hashes is exact? – Patrick E. Oct 04 '16 at 02:52
  • Looks like you need `/(?:^|\s)(foo|bar|foo bar|#hashtag)(?=$|\s)/` – Wiktor Stribiżew Oct 04 '16 at 06:55
  • @WiktorStribiżew: That might or might not be too restrictive (and only OP can tell). For example, `Eleanor said "#hashtag is trending"` would not match with yours, but would with mine. – Amadan Oct 04 '16 at 06:57