Difference between ?:, ?! and ?=

Question

I searched for the meaning of these expressions but couldn't understand the exact difference between them.

This is what they say:

?: Match expression but do not capture it.
?= Match a suffix but exclude it from capture.
?! Match if the suffix is absent.

I tried using these in simple RegEx and got similar results for all.

For example: the following 3 expressions give very similar results.

[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+(?!\.[a-zA-Z0-9]+)*
[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+(?=\.[a-zA-Z0-9]+)*
[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9]+)*

Please show us your test case. They should not give the same results. — Bergi, May 29 '12 at 18:41
@sepp2k, it same similar results in few case, one of them mentioned in the question. — RK Poddar, May 29 '12 at 18:41
@Bergi, i tested it with random data, containing english words, phone numbers, urls, e-mail addresses, numbers, etc.. — RK Poddar, May 29 '12 at 18:43
@RKAgarwal Ah, I see what you did there. You added a `*` after the groups, so they're simply ignored. — sepp2k, May 29 '12 at 18:44
*noobie note*: you'd only use these at the start of parenthesis, and parenthesis form a capturing group (different parenthesis sets extract different sections of text). — Ryan Taylor, Jun 07 '17 at 19:06

score 269 · Accepted Answer · answered May 29 '12 at 18:43

The difference between ?= and ?! is that the former requires the given expression to match and the latter requires it to not match. For example a(?=b) will match the "a" in "ab", but not the "a" in "ac". Whereas a(?!b) will match the "a" in "ac", but not the "a" in "ab".

The difference between ?: and ?= is that ?= excludes the expression from the entire match while ?: just doesn't create a capturing group. So for example a(?:b) will match the "ab" in "abc", while a(?=b) will only match the "a" in "abc". a(b) would match the "ab" in "abc" and create a capture containing the "b".

score 151 · Answer 2 · edited Aug 25 '22 at 13:28

151

?: is for non capturing group
?= is for positive look ahead
?! is for negative look ahead
?<= is for positive look behind
?<! is for negative look behind

Please check Lookahead and Lookbehind Zero-Length Assertions for very good tutorial and examples on lookahead in regular expressions.

edited Aug 25 '22 at 13:28

Laurel

5,965
14
31
57

answered May 29 '12 at 18:38

anubhava

761,203
64
569
643

16

Yet JavaScript does not know lookbehind. – Bergi May 29 '12 at 18:45
1

This one is more complete for general regex. – Yan Yang Jan 25 '18 at 00:37
/(?<=^a)b/ worked for me in javascript! There seems to be no tutorial for looking behind in Javascript on the internet. – martian17 May 16 '18 at 06:58
Only recent versions of browsers have started supporting look behind in JS – anubhava May 16 '18 at 07:10
– anubhava I don't know any alternative to /(?<=^a)b/ using the pure regular expression. Perhaps I can but I would have to rely on callback functions. – martian17 May 16 '18 at 07:17
What's a `non capturing group`? – Shayan Dec 04 '19 at 10:43
A group that starts with `(:` and doesn't capture values. Please read more details on above website link in question. – anubhava Dec 04 '19 at 11:48

freedev · Answer 3 · 2021-11-08T15:38:08.500

To better understand let's apply the three expressions plus a capturing group and analyse each behaviour.

() capturing group - the regex inside the parenthesis must be matched and the match create a capturing group
(?:) non-capturing group - the regex inside the parenthesis must be matched but does not create the capturing group
(?=) positive lookahead - asserts that the regex must be matched
(?!) negative lookahead - asserts that it is impossible to match the regex

Let's apply q(u)i to quit.
q matches q and the capturing group u matches u.
The match inside the capturing group is taken and a capturing group is created.
So the engine continues with i.
And i will match i.
This last match attempt is successful.
qui is matched and a capturing group with u is created.

Let's apply q(?:u)i to quit.
Again, q matches q and the non-capturing group u matches u.
The match from the non-capturing group is taken, but the capturing group is not created.
So the engine continues with i.
And i will match i.
This last match attempt is successful.
qui is matched.

Let's apply q(?=u)i to quit.
The lookahead is positive and is followed by another token.
Again, q matches q and u matches u.
But the match from the lookahead must be discarded, so the engine steps back from i in the string to u.
Given that the lookahead was successful the engine continues with i.
But i cannot match u.
So this match attempt fails.

Let's apply q(?=u)u to quit.
The lookahead is positive and is followed by another token.
Again, q matches q and u matches u.
But the match from the lookahead must be discarded, so the engine steps back from u in the string to u.
Given that the lookahead was successful the engine continues with u.
And u will match u. So this match attempt is successful.
qu is matched.

Let's apply q(?!i)u to quit.
Even in this case lookahead is positive (because i does not match) and is followed by another token.
Again, q matches q and i doesn't match u.
The match from the lookahead must be discarded, so the engine steps back from u in the string to u.
Given that the lookahead was successful the engine continues with u.
And u will match u.
So this match attempt is successful.
qu is matched.

So, in conclusion, the real difference between lookahead and non-capturing groups is all about if you want just to test the existence or test and save the match.

But capturing groups are expensive so use it judiciously.

> _so the engine steps back from i in the string to u. The lookahead was successful, so the engine continues with i. But i cannot match u_ THIS is totally confusing. Why **step back** if this is **lookahead**? — Green, Mar 01 '20 at 12:02
@Green An important thing to understand about lookahead and other lookaround constructs is that although they go through the motions to see if their subexpression is able to match, they don’t actually “consume” any text. That may be a bit confusing — freedev, Mar 01 '20 at 23:42
This is so helpful. I think this should be the accepted answer. — , Jan 18 '22 at 04:47

score 10 · Answer 4 · edited Dec 21 '17 at 21:00

Try matching foobar against these:

/foo(?=b)(.*)/
/foo(?!b)(.*)/

The first regex will match and will return "bar" as first submatch — (?=b) matches the 'b', but does not consume it, leaving it for the following parentheses.

The second regex will NOT match, because it expects "foo" to be followed by something different from 'b'.

(?:...) has exactly the same effect as simple (...), but it does not return that portion as a submatch.

score 1 · Answer 5 · answered May 09 '20 at 12:44

The simplest way to understand assertions is to treat them as the command inserted into a regular expression. When the engine runs to an assertion, it will immediately check the condition described by the assertion. If the result is true, then continue to run the regular expression.

score 1 · Answer 6 · answered May 24 '20 at 23:18

This is the real difference:

>>> re.match('a(?=b)bc', 'abc')
<Match...>
>>> re.match('a(?:b)c', 'abc')
<Match...>

# note:
>>> re.match('a(?=b)c', 'abc')
None

If you dont care the content after "?:" or "?=", "?:" and "?=" are just the same. Both of them are ok to use.

But if you need those content for further process(not just match the whole thing. In that case you can simply use "a(b)") You have to use "?=" instead. Cause "?:"will just through it away.

Difference between ?:, ?! and ?=

6 Answers6

Linked