regex word boundary excluding the hyphen

Question

i need a regex that matches an expression ending with a word boundary, but which does not consider the hyphen as a boundary. i.e. get all expressions matched by

type ([a-z])\b

but do not match e.g.

type a-1

to rephrase: i want an equivalent of the word boundary operator \b which instead of using the word character class [A-Za-z0-9_], uses the extended class: [A-Za-z0-9_-]

What regex engine are you using -- is this .NET, javascript, etc.? — Jay, Apr 17 '12 at 18:00

Andrew Clark · Accepted Answer · 2012-04-17T18:08:22.383

38

You can use a lookahead for this, the shortest would be to use a negative lookahead:

type ([a-z])(?![\w-])

(?![\w-]) would mean "fail the match if the next character is in \w or is a -".

Here is an option that uses a normal lookahead:

type ([a-z])(?=[^\w-]|$)

You can read (?=[^\w-]|$) as "only match if the next character is not in the character class [\w-], or this is the end of the string".

See it working: http://www.rubular.com/r/NHYhv72znm

edited Apr 17 '12 at 18:08

answered Apr 17 '12 at 18:03

Andrew Clark

202,379
35
273
306

In case you also want to match with a space instead of a end or begin of word, you have to add parentheses around the dollar, i.e. ([a-z])(?![\w-])|($|\s). In my case I want to exclude the hyphen from the word boundaries at the beginning and end of a 8 digit number. The regular experssion looked like re.search(r"((?![-\w])|(\s|^))(\d{8})((?![-\w])|(\s|^))", "-12345678 ") – Eelco van Vliet Apr 08 '19 at 12:06

score 16 · Answer 2 · edited Apr 24 '18 at 11:39

16

I had a pretty similar problem except I didn't want to consider the '*' as a boundary character. Here's what I did:

\b(?<!\*)([^\s\*]+)\b(?!*)

Basically, if you're at a word boundary, look back one character and don't match if the previous character was an '*'. If you're in the middle, don't match on a space or asterisk. If you're at the end, make sure the end isn't an asterisk. In your case, I think you could use \w instead of \s. For me, this worked in these situations:

*word
wo*rd
word*

edited Apr 24 '18 at 11:39

fracz

20,536
18
103
149

answered Aug 28 '14 at 01:52

Jonathan

161
1
2

4

Your regex has invalid syntax – MaxZoom Mar 06 '15 at 15:44

regex word boundary excluding the hyphen

2 Answers2

Linked