11

I have regex that matches words fine except if they contain a special character such as ~Query which is the name of a member of a C++ class. Need to use word boundary as shown below for member names that are single characters. $key =~ /\b$match\b/

I tried numerous expressions I thought would work such as /[~]*\b$match\b/ or /\b[~]*$match\b/

Is it possible to put a word boundary on words that may contain a special character?

cdhowie
  • 158,093
  • 24
  • 286
  • 300
Jeff Cunningham
  • 145
  • 1
  • 11
  • 1
    Can you post exactly what you want to match.. Regex are generated for specific cases, not just by assuming what your string is.. – Rohit Jain Oct 03 '12 at 16:28
  • 1
    `/~\b$match\b/` should match `~Query`, assuming that the regex contained in `$match` would match `Query`. (I just tested, and `" ~foo " =~ /~\bfoo\b/` evaluates as true.) – cdhowie Oct 03 '12 at 16:29
  • $match variable might contain ~Query, Query, or single letter such as p. Are possibly any other strange name developers use for their class methods. Regex is part of a subroutine that is doing a search. All works fine except when $match contains ~Query. – Jeff Cunningham Oct 03 '12 at 16:44

2 Answers2

18
\b

is short for

(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))

If you want to treat ~ as a word character, change \w to [\w~].

(?:(?<![\w~])(?=[\w~])|(?<=[\w~])(?![\w~]))

Example usage:

my $word_char = qr/[\w~]/;
my $boundary  = qr/(?<!$word_char)(?=$word_char)
                  |(?<=$word_char)(?!$word_char)/x;

$key =~ /$boundary$match$boundary/

If we know $match can only match something that starts and ends with a $word_char, we can simplify as follows:

my $word_char   = qr/[\w~]/;
my $start_bound = qr/(?<!$word_char)/;
my $end_bound   = qr/(?!$word_char)/;

$key =~ /$start_bound$match$end_bound/

This is simple enough that we can inline.

$key =~ /(?<![\w~])$match(?![\w~])/
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • Thanks this does what I want although it is lengthy. $key =~ /(?:(?<=[\w~])(?![\w~])|(?<![\w~])(?=[\w~]))$match(?:(?<=[\w~])(?![\w~])|(?<![\w~])(?=[\w~]))/) – Jeff Cunningham Oct 03 '12 at 17:01
  • 1
    If I need to include other special characters would they be added like this [\w~`] – Jeff Cunningham Oct 03 '12 at 17:13
  • The same \b expansion worked for C#, and replacing \w also worked like a charm. – drizin Aug 22 '16 at 14:26
  • Is there a work around for Firefox not supporting the look behinds? – adam0101 Oct 30 '19 at 21:56
  • @ikegami, sorry, yes - JavaScript. I'm extracting multiple programming language names (C#, C++) from text. Your code ported over to JS works awesome except for browsers that don't support look behinds :( – adam0101 Oct 30 '19 at 22:13
  • @adam0101, You could use something like `(?:^|[^\w~])([\w~]+)(?![\w~])` and extract the first capture. But you're no longer matching the boundary itself, so it has limitations (e.g. it's harder to use in larger patterns). – ikegami Oct 30 '19 at 22:13
4

Assuming you don't need to check the contents of $match (i.e. it always contains a valid identifier) you can write this

$key =~ /(?<![~\w])$match(?![~\w])/

which simply checks that the string in $match isn't preceded or followed by alphanumerics, underscores or tildes

Borodin
  • 126,100
  • 9
  • 70
  • 144