186

Imagine you are trying to pattern match "stackoverflow".

You want the following:

 this is stackoverflow and it rocks [MATCH]

 stackoverflow is the best [MATCH]

 i love stackoverflow [MATCH]

 typostackoverflow rules [NO MATCH]

 i love stackoverflowtypo [NO MATCH]

I know how to parse out stackoverflow if it has spaces on both sites using:

/\s(stackoverflow)\s/

Same with if its at the start or end of a string:

/^(stackoverflow)\s/

/\s(stackoverflow)$/

But how do you specify "space or end of string" and "space or start of string" using a regular expression?

Patrick McDonald
  • 64,141
  • 14
  • 108
  • 120
anonymous-one
  • 14,454
  • 18
  • 60
  • 84

4 Answers4

246

You can use any of the following:

\b      #A word break and will work for both spaces and end of lines.
(^|\s)  #the | means or. () is a capturing group. 


/\b(stackoverflow)\b/

Also, if you don't want to include the space in your match, you can use lookbehind/aheads.

(?<=\s|^)         #to look behind the match
(stackoverflow)   #the string you want. () optional
(?=\s|$)          #to look ahead.
Chuck Le Butt
  • 47,570
  • 62
  • 203
  • 289
Jacob Eggers
  • 9,062
  • 2
  • 25
  • 43
  • 11
    `\b` is a zero-width assertion; it never consumes any characters. There's no need to wrap it in a lookaround. – Alan Moore Jul 15 '11 at 21:41
  • good point. I was thinking about his original `\s`. I will adjust my answer. – Jacob Eggers Jul 15 '11 at 21:46
  • 3
    Note that in most regexp implementations, `\b` is **standard ASCII only**, that is to say, no unicode support. If you need to match unicode words you have no choice but to use this instead: http://stackoverflow.com/a/6713327/1329367 – Mahn Jan 27 '15 at 16:55
  • 4
    The easier way to exclude the group selection from the match is `(?:^|\s)` – sam-6174 Oct 22 '15 at 16:48
  • 10
    for python, replace `(?<=\s|^)` with `(?:(?<=\s)|(?<=^))`. Otherwise, you get `error: look-behind requires fixed-width pattern` – sam-6174 Aug 31 '16 at 20:06
  • Thanks for the look behind and look ahead solution. This makes results comparable to \b – Brian Risk Jun 15 '17 at 14:20
  • 7
    The `\b` would consider other characters -- such as "`.`" as word-breakers, whereas the asker specifically said "space". @gordy's solution seems better. – Mikhail T. Dec 01 '17 at 17:42
  • Beware: lookbehind is not implemented [in most browsers](https://caniuse.com/#feat=js-regexp-lookbehind) as of 2019. – user Apr 27 '19 at 22:43
  • See [this answer](https://stackoverflow.com/a/6713427/11069485) for a Python-friendly regex that's a bit neater than the one suggested by @user2426679 – Chris Wong May 26 '22 at 01:08
99

(^|\s) would match space or start of string and ($|\s) for space or end of string. Together it's:

(^|\s)stackoverflow($|\s)
gordy
  • 9,360
  • 1
  • 31
  • 43
  • 3
    If you use this pattern to replace, remember to keep the spaces in the replaced result by replacing with the pattern `$1string$2`. – Mahn Jan 27 '15 at 16:57
  • 1
    This is the only one that works for me too. Word boundaries never seem to do what I want. For one, they match some characters besides whitespace (like dashes). This solved it for me because I'd been trying to put `$` and `^` into a character class, but this shows they can just be put into a regular pattern group. – felwithe Jan 02 '19 at 14:20
  • 1
    This works quite nicely but if you are not interested in capturing the spaces use this: `(?:^|\s)stackoverflow(?:$|\s)` – Vlax Apr 12 '21 at 21:03
27

Here's what I would use:

 (?<!\S)stackoverflow(?!\S)

In other words, match "stackoverflow" if it's not preceded by a non-whitespace character and not followed by a non-whitespace character.

This is neater (IMO) than the "space-or-anchor" approach, and it doesn't assume the string starts and ends with word characters like the \b approach does.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • 1
    good explanation on why to use this. i would have picked this however the string being tested is ALWAYS a single line. – anonymous-one Jul 17 '11 at 18:21
  • 1
    @LawrenceDol, did you mean `(?<=\S)...(?=\S)`? Note that the uppercase `\S` matches any character that's NOT whitespace. So the negative lookarounds will match if there IS a whitespace character there, or if there's no character at all. – Alan Moore Dec 20 '20 at 02:38
10

\b matches at word boundaries (without actually matching any characters), so the following should do what you want:

\bstackoverflow\b
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 1
    For Python it helps to specify it a [raw string](https://docs.python.org/3/reference/lexical_analysis.html#index-19), e.g. `mystr = r'\bstack overflow\b'` – Asclepius Mar 26 '19 at 15:33