Regex lookahead, lookbehind and atomic groups

Question

I found these things in my regex body but I haven't got a clue what I can use them for. Does somebody have examples so I can try to understand how they work?

(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind

(?>) - atomic group

Why doesn't the regex website have some simple table like this? Instead they have blocks of text explaining only. http://www.regular-expressions.info/lookaround.html — Whitecat, Aug 22 '16 at 17:30

score 1587 · Accepted Answer · edited Mar 25 '22 at 15:21

1587

Examples

Given the string foobarbarfoo:

bar(?=bar)     finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

(?<=foo)bar(?=bar)    finds the 1st bar ("bar" with "foo" before it and "bar" after it)

Definitions

Look ahead positive `(?=)`

Find expression A where expression B follows:

A(?=B)

Look ahead negative `(?!)`

Find expression A where expression B does not follow:

A(?!B)

Look behind positive `(?<=)`

Find expression A where expression B precedes:

(?<=B)A

Look behind negative `(?<!)`

Find expression A where expression B does not precede:

(?<!B)A

Atomic groups `(?>)`

An atomic group exits a group and throws away alternative patterns after the first matched pattern inside the group (backtracking is disabled).

(?>foo|foot)s applied to foots will match its 1st alternative foo, then fail as s does not immediately follow, and stop as backtracking is disabled

A non-atomic group will allow backtracking; if subsequent matching ahead fails, it will backtrack and use alternative patterns until a match for the entire expression is found or all possibilities are exhausted.

(foo|foot)s applied to foots will:
1. match its 1st alternative foo, then fail as s does not immediately follow in foots, and backtrack to its 2nd alternative;
2. match its 2nd alternative foot, then succeed as s immediately follows in foots, and stop.

Some resources

Online testers

https://regex101.com

edited Mar 25 '22 at 15:21

Donald Duck

8,409
22
75
99

answered Jun 04 '10 at 11:06

skyfoot

20,629
8
49
71

1

What do you mean by "finds the second bar" part? There is only one bar in the expression/string. Thanks – ziggy Feb 08 '14 at 11:22
6

@ziggy the string being tested is "foobarbarfoo". As you can see there are two foo and two bar in the string. – skyfoot Feb 12 '14 at 10:56
@ziggy try to go to http://pythex.org/ and play a little bit about it. you will understand it totally – stanleyli Mar 30 '15 at 19:09
Place two bars side by side, like, `barbar` in the text on which these regexs will be tried. – Pallav Jha May 31 '17 at 13:08
4

Can someone explain when one may need an atomic group? If I only need to match with the first alternative, why would I want to give multiple alternatives? – arviman Aug 09 '17 at 12:27
@skyfoot or anyone on here. I can see that the "(?<=B)A" lookbehind is always before the actual lookup. Does it mean it must always comes before? Can this also be done "A(?<=B)"? As the name suggest it looks "behind" and it looks "ahead". Thank you if anyone can explain. – Chopnut Apr 21 '18 at 00:53
5

**Better explanation about atomic group** at [this answer](https://stackoverflow.com/a/14412277/287948). Can someone edit here to complete this didatic answer? – Peter Krauss Apr 27 '18 at 10:18
16

Just a note that this answer was essential when I ended up on a project that required serious regex chops. This is an excellent, concise explanation of look-arounds. – Tom Coughlin May 23 '19 at 20:49

score 244 · Answer 2 · edited Aug 24 '16 at 13:04

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

Read regular-expression.info for more details.

Positive lookahead:

Syntax:

(?=REGEX_1)REGEX_2

Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

example:

(?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}

REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
REGEX_2 is [a-z]{1,2}[0-9]{2,3} which matches one or two letters followed by two or three digits.

REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

Negative lookahead

Syntax:

(?!REGEX_1)REGEX_2

Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

example:

(?!.*\bFWORD\b)\w{10,30}$

The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD, the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion to have a fixed length.

Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups

following your explanation, does not seem to work in javascript, /(?=source)hello/.exec("source...hummhellosource") = null. Is your explanation correct? — Helin Wang, Jun 01 '13 at 17:47
@HelinWang That explanation is correct. Your regex expects a string that is both source and hello at the same time! — Amarghosh, Jun 04 '13 at 11:54
@Amarghosh I agree with "They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.". So lookahead should check for a regex towards right of the current position and the syntax of positive lookahead should be x(?=y) — jddxf, Oct 05 '16 at 11:28
@Amarghosh would `(?=REGEX_1)REGEX_2` only match if `REGEX_2` comes *after* `REGEX_1`? — aandis, May 22 '18 at 11:50

score 1 · Answer 3 · answered May 03 '22 at 12:45

Why - Suppose you are playing wordle, and you've entered "ant". (Yes three-letter word, it's only an example - chill)

The answer comes back as blank, yellow, green, and you have a list of three letter words you wish to use a regex to search for? How would you do it?

To start off with you could start with the presence of the t in the third position:

[a-z]{2}t

We could improve by noting that we don't have an a

[b-z]{2}t

We could further improve by saying that the search had to have an n in it.

(?=.*n)[b-z]{2}t

or to break it down;

(?=.*n) - Look ahead, and check the match has an n in it, it may have zero or more characters before that n

[b-z]{2} - Two letters other than an 'a' in the first two positions;

t - literally a 't' in the third position

AbstProcDo · Answer 4 · 2018-04-15T06:30:55.683

0

Grokking lookaround rapidly.
How to distinguish lookahead and lookbehind? Take 2 minutes tour with me:

(?=) - positive lookahead
(?<=) - positive lookbehind

Suppose

    A  B  C #in a line

Now, we ask B, Where are you?
B has two solutions to declare it location:

One, B has A ahead and has C bebind
Two, B is ahead(lookahead) of C and behind (lookhehind) A.

As we can see, the behind and ahead are opposite in the two solutions.
Regex is solution Two.

edited Apr 15 '18 at 06:30

answered Apr 04 '18 at 15:08

AbstProcDo

19,953
19
81
138

I think you got it backwards: `B` is ahead of `A` and `B` is behind `C` Alternatively, `C` is ahead of `B` and `A` is behind `B`. Or did I miss something? – Jon Grah Aug 08 '22 at 07:42

Golden Lion · Answer 5 · 2022-06-09T15:10:32.820

-1

I used look behind to find the schema and look ahead negative to find tables missing with(nolock)

expression="(?<=DB\.dbo\.)\w+\s+\w+\s+(?!with\(nolock\))"

matches=re.findall(expression,sql)
for match in matches:
    print(match)

edited Jun 09 '22 at 15:10

answered Jun 09 '22 at 14:51

Golden Lion

3,840
2
26
35

Regex lookahead, lookbehind and atomic groups

5 Answers5

Examples

Definitions

Look ahead positive `(?=)`

Look ahead negative `(?!)`

Look behind positive `(?<=)`

Look behind negative `(?<!)`

Atomic groups `(?>)`

Some resources

Online testers

Linked

Related

Regex lookahead, lookbehind and atomic groups

5 Answers5

Examples

Definitions

Look ahead positive (?=)

Look ahead negative (?!)

Look behind positive (?<=)

Look behind negative (?<!)

Atomic groups (?>)

Some resources

Online testers

Linked

Related

Look ahead positive `(?=)`

Look ahead negative `(?!)`

Look behind positive `(?<=)`

Look behind negative `(?<!)`

Atomic groups `(?>)`