1

Hey there I'm trying to write a multi-word RegEx that uses word boundaries. The string I'm searching is as follows (this is only a test string):

const regString = "/gamma/ truck?timer!doctor\\face"

Here is my regular expression:

const testReg2 = new RegExp("\\b\/gamma/|truck|\\?|face\\b", "gi");

For whatever reason the console posts ['truck', '?', 'face'] but it refuses to find '/gamma/' which is puzzling because const testReg = new RegExp("/gamma/"); does find '/gamma/'.

const regString = "/gamma/ truck?timer!doctor\\face"
const testReg2 = new RegExp("\\b\/gamma/|truck|\\?|face\\b", "gi");

console.log(regString.match(testReg2))
Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
Slakemoth
  • 11
  • 2

3 Answers3

0

The issue is in using \b\/gamma and expecting it to match "/gamma" (same for the closing \b).

The Word Boundary \b in order to work it should be followed (or preceded, depending on where it's used) by at least one (or more) Word characters. Which is not — due to the non-word / character.

To visualize, let's spread those characters:

# Regex: \bxyz

/ g a m m a
 ^--------- \b points here. Match: `gamma`

# Regex: \b\/xyz

/ g a m m a
\b points nowhere. Match: none since `/` is a non-word char

Either: remove the \/ since that's already a specific character prefix / suffix, or...

If you really need to make sure the exact existance and sequence of those characters /gamma/, than don't use \b for the /gamma/ case.

\/gamma\/|\b(truck|\?|face)\b

Example on Regex101.com

const regString = "/gamma/ truck?timer!doctor\\face"

console.log(regString.match(/\/gamma\/|\b(truck|\?|face)\b/gi))
Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
  • OP wanted to match `/gamma/` not `gamma`. They probably didn't know that `\b` doesn't always work on the beginning of the string. – Konrad Sep 18 '22 at 19:32
  • @KonradLinkowski your statement is wrong. Neither I suggested anything about `gamma` I was just focusing on the first `/` in `/gam...` to make a valid point. And why do you think `\b` *"doesn't always work on the beginning of a string"*? https://regex101.com/r/Lmd1qQ/1 – Roko C. Buljan Sep 18 '22 at 19:48
  • Ah thank you I actually just starting reading up on \b and began to see the problem, I assumed it just captured a whole word or similar without any additional characters. But now I understand it captures \w up until the next \W. I see now it isn't even useful for what I needed, sorry. – Slakemoth Sep 18 '22 at 19:49
  • @Slakemoth the rule of thumbs is just to: **Don't** use non-word characters right after `\b` - it makes no sense. :) There's no word boundary between a `\b` and a `/` (or any other non-word char) – Roko C. Buljan Sep 18 '22 at 19:53
  • 1
    Thank you I understand that alot better now, I completely misunderstood how ```\b``` worked. – Slakemoth Sep 18 '22 at 19:54
0

There are two ways to make this work, a) using word boundaries with words that start/end with word chars only (when the words are know beforehand, it is the easiest way), or b) using adaptive dynamic word boundaries (this is good if your "word" list is dynamic, or user-defined).

Here is Option 1:

const regString = "/gamma/ truck?timer!doctor\\face"
const testReg2 = /\/gamma\/|\?|\b(?:truck|face)\b/gi;
console.log(regString.match(testReg2))

So, here, ? and /gamma/ are NOT checked if they are whole words or not, since they start/end with non-word chars.

Now, Option 2:

const regString = "/gamma/ truck?timer!doctor\\face";
const words = ["/gamma/", "?", "truck", "face"];
const testReg2 = new RegExp("(?!\\B\\w)(?:" + words.map(x => x.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')).join("|") + ")(?<!\\w\\B)", "gi");
console.log(regString.match(testReg2));

This solution requires the lookbehind support, but it makes sure the whole word check is applied only in case there are word chars at the start/end of the word (also, all special chars are escaped appropriately).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

According to this answer

[...] at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_])

Your string doesn't start with [0-9A-Za-z_] hence not match

Konrad
  • 21,590
  • 4
  • 28
  • 64