534

I found these things in my regex body but I haven't got a clue what I can use them for. Does somebody have examples so I can try to understand how they work?

(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind

(?>) - atomic group
grenierm5
  • 186
  • 4
  • 14
Spidfire
  • 5,433
  • 6
  • 28
  • 36
  • 50
    Why doesn't the regex website have some simple table like this? Instead they have blocks of text explaining only. http://www.regular-expressions.info/lookaround.html – Whitecat Aug 22 '16 at 17:30
  • 11
    @Whitecat Try: https://regex101.com http://www.regexr.com – Andrew Mar 28 '17 at 14:18

5 Answers5

1587

Examples

Given the string foobarbarfoo:

bar(?=bar)     finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

(?<=foo)bar(?=bar)    finds the 1st bar ("bar" with "foo" before it and "bar" after it)

Definitions

Look ahead positive (?=)

Find expression A where expression B follows:

A(?=B)

Look ahead negative (?!)

Find expression A where expression B does not follow:

A(?!B)

Look behind positive (?<=)

Find expression A where expression B precedes:

(?<=B)A

Look behind negative (?<!)

Find expression A where expression B does not precede:

(?<!B)A

Atomic groups (?>)

An atomic group exits a group and throws away alternative patterns after the first matched pattern inside the group (backtracking is disabled).

  • (?>foo|foot)s applied to foots will match its 1st alternative foo, then fail as s does not immediately follow, and stop as backtracking is disabled

A non-atomic group will allow backtracking; if subsequent matching ahead fails, it will backtrack and use alternative patterns until a match for the entire expression is found or all possibilities are exhausted.

  • (foo|foot)s applied to foots will:

    1. match its 1st alternative foo, then fail as s does not immediately follow in foots, and backtrack to its 2nd alternative;
    2. match its 2nd alternative foot, then succeed as s immediately follows in foots, and stop.

Some resources

Online testers

Donald Duck
  • 8,409
  • 22
  • 75
  • 99
skyfoot
  • 20,629
  • 8
  • 49
  • 71
  • 1
    What do you mean by "finds the second bar" part? There is only one bar in the expression/string. Thanks – ziggy Feb 08 '14 at 11:22
  • 6
    @ziggy the string being tested is "foobarbarfoo". As you can see there are two foo and two bar in the string. – skyfoot Feb 12 '14 at 10:56
  • @ziggy try to go to http://pythex.org/ and play a little bit about it. you will understand it totally – stanleyli Mar 30 '15 at 19:09
  • Place two bars side by side, like, `barbar` in the text on which these regexs will be tried. – Pallav Jha May 31 '17 at 13:08
  • 4
    Can someone explain when one may need an atomic group? If I only need to match with the first alternative, why would I want to give multiple alternatives? – arviman Aug 09 '17 at 12:27
  • @skyfoot or anyone on here. I can see that the "(?<=B)A" lookbehind is always before the actual lookup. Does it mean it must always comes before? Can this also be done "A(?<=B)"? As the name suggest it looks "behind" and it looks "ahead". Thank you if anyone can explain. – Chopnut Apr 21 '18 at 00:53
  • 5
    **Better explanation about atomic group** at [this answer](https://stackoverflow.com/a/14412277/287948). Can someone edit here to complete this didatic answer? – Peter Krauss Apr 27 '18 at 10:18
  • 16
    Just a note that this answer was essential when I ended up on a project that required serious regex chops. This is an excellent, concise explanation of look-arounds. – Tom Coughlin May 23 '19 at 20:49
244

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

Read regular-expression.info for more details.

  • Positive lookahead:

Syntax:

(?=REGEX_1)REGEX_2

Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

example:

(?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}

REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
REGEX_2 is [a-z]{1,2}[0-9]{2,3} which matches one or two letters followed by two or three digits.

REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

  • Negative lookahead

Syntax:

(?!REGEX_1)REGEX_2

Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

example:

(?!.*\bFWORD\b)\w{10,30}$

The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD, the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion to have a fixed length.

  • Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups
mike
  • 4,929
  • 4
  • 40
  • 80
Amarghosh
  • 58,710
  • 11
  • 92
  • 121
  • following your explanation, does not seem to work in javascript, /(?=source)hello/.exec("source...hummhellosource") = null. Is your explanation correct? – Helin Wang Jun 01 '13 at 17:47
  • @HelinWang That explanation is correct. Your regex expects a string that is both source and hello at the same time! – Amarghosh Jun 04 '13 at 11:54
  • @jddxf Care to elaborate? – Amarghosh Oct 04 '16 at 05:19
  • @Amarghosh I agree with "They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.". So lookahead should check for a regex towards right of the current position and the syntax of positive lookahead should be x(?=y) – jddxf Oct 05 '16 at 11:28
  • @Amarghosh would `(?=REGEX_1)REGEX_2` only match if `REGEX_2` comes *after* `REGEX_1`? – aandis May 22 '18 at 11:50
1

Why - Suppose you are playing wordle, and you've entered "ant". (Yes three-letter word, it's only an example - chill)

The answer comes back as blank, yellow, green, and you have a list of three letter words you wish to use a regex to search for? How would you do it?

To start off with you could start with the presence of the t in the third position:

[a-z]{2}t

We could improve by noting that we don't have an a

[b-z]{2}t

We could further improve by saying that the search had to have an n in it.

(?=.*n)[b-z]{2}t

or to break it down;

(?=.*n) - Look ahead, and check the match has an n in it, it may have zero or more characters before that n

[b-z]{2} - Two letters other than an 'a' in the first two positions;

t - literally a 't' in the third position

Grimley
  • 107
  • 10
0

Grokking lookaround rapidly.
How to distinguish lookahead and lookbehind? Take 2 minutes tour with me:

(?=) - positive lookahead
(?<=) - positive lookbehind

Suppose

    A  B  C #in a line

Now, we ask B, Where are you?
B has two solutions to declare it location:

One, B has A ahead and has C bebind
Two, B is ahead(lookahead) of C and behind (lookhehind) A.

As we can see, the behind and ahead are opposite in the two solutions.
Regex is solution Two.

AbstProcDo
  • 19,953
  • 19
  • 81
  • 138
  • I think you got it backwards: `B` is ahead of `A` and `B` is behind `C` Alternatively, `C` is ahead of `B` and `A` is behind `B`. Or did I miss something? – Jon Grah Aug 08 '22 at 07:42
-1

I used look behind to find the schema and look ahead negative to find tables missing with(nolock)

expression="(?<=DB\.dbo\.)\w+\s+\w+\s+(?!with\(nolock\))"

matches=re.findall(expression,sql)
for match in matches:
    print(match)
Golden Lion
  • 3,840
  • 2
  • 26
  • 35