44

I'm trying to understand how negative lookaheads work on simple examples. For instance, consider the following regex:

a(?!b)c

I thought the negative lookahead matches a position. So, in that case the regex matches any string that contains strictly 3 characters and is not abc.

But it's not true, as can be seen in this demo. Why?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
St.Antario
  • 26,175
  • 41
  • 130
  • 318
  • Maybe this can help: http://www.regular-expressions.info/lookaround.html – gen_Eric Dec 29 '14 at 15:07
  • 1
    @RocketHazmat Yes, it's helpful, but it was the first result in the google :) I've read it – St.Antario Dec 29 '14 at 15:08
  • To add to this problem, a common misconception is that you can use negative lookaheads to substitute for multi-word negation or even attempt otherwise, so you will see broken regexes like these: `[A-Za-z]+(?![A-Za-z])`, `[^sword]fish`, `(?!sword)fish` – Unihedron Dec 30 '14 at 11:09

4 Answers4

61

Lookaheads do not consume any characters. It just checks if the lookahead can be matched or not:

a(?!b)c

So here after matching a it just checks if it is followed not by b but does not consume that not character (which is c) and is followed by c.

How a(?!b)c matches ac

ac
|
a

ac
 |
(?!b) #checks but does not consume. Pointer remains at c

ac
 |
 c

Positive lookahead

The positive lookahead is similar in that it tries to match the pattern in the lookahead. If it can be matched, then the regex engine proceeds with matching the rest of the pattern. If it cannot, the match is discarded.

E.g.

abc(?=123)\d+ matching abc123

abc123
|
a

abc123
 |
 b

abc123
  c

abc123 #Tries to match 123; since is successful, the pointer remains at c
    |
 (?=123)

abc123 # Match is success. Further matching of patterns (if any) would proceed from this position
  |

abc123
   |
  \d

abc123
    |
   \d

abc123 #Reaches the end of input. The pattern is matched completely. Returns a successfull match by the regex engine
     |
    \d
Jongware
  • 22,200
  • 8
  • 54
  • 100
nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52
  • 2
    Important to note maybe that in, for example, `.+(?!b).+` if `(?!b)` fails, the first `.+` will try match a shorter string, and then the check is done again. As such, `.+(?!b).+` would match `bbbabb` with the first `.+` matching `bbb` and the second one matching `abb`. ([demo](https://regex101.com/r/eE8rM0/1)) – Sumurai8 Dec 29 '14 at 20:48
  • 2
    Rather than say that backtracking occurs, it's probably more correct to say that the pointer never moves. Backtracking happens with regular matching when the pattern fails. – pguardiario Dec 30 '14 at 05:19
  • Couldn't you make a note about positive lookahead? Is that the same as the negative lookahead unless it checks that the pattern in `(?=pattern)` _is_ matched. – St.Antario Dec 30 '14 at 08:22
  • @St.Antario Yup. It just checks if the pattern can be matched. if it can be matched it will proceed with the rest pattern else will discard. Take a look at [this answer which i just answered on look ahead](http://stackoverflow.com/questions/27701747/regular-expression-which-allows-numbers-spaces-plus-sign-hyphen-and-brackets) – nu11p01n73R Dec 30 '14 at 08:25
  • @St.Antario I have added an edit on postive look ahead as well. Hope it helps you :) – nu11p01n73R Dec 30 '14 at 08:46
8

@Antario, I was confused about the negative look ahead/behind case in regex for a while and this site has a great explanation.

So with your example what you are saying is that you have a literal "a" and it is NOT followed by a literal "b" and it IS followed by a literal "c".

Here is a different regex debugger than you used which gives a more visual answer which personally I find helpful :)

a(?!b)c

Regular expression visualization

Debuggex Demo

laurenOlga
  • 721
  • 2
  • 10
  • 21
3

a(?!b)c will match only ac because the only way you'll have an a followed by "not b" (which will not be consumed) and then c, is ac.

Maroun
  • 94,125
  • 30
  • 188
  • 241
2

So, in that case the regex matches any string that contains strictly 3 characters and is not the abc

This is not quite right. This regex states that we are searching a sequence which firstsymbol is a and after that is c, and inside there is no b.

For example, a(?!b). will match either ac or af as there is no restrictions on the last symbol via .

VMAtm
  • 27,943
  • 17
  • 79
  • 125