PCRE regular expression overlapping matches

Question

i have the following string

001110000100001100001

and this expression

/[1]....[1]/g

this makes two matches

but i want it to also match the pattern between those both with lookbehind so to say, the overlapping 1

i have absolutely no clue, how can this work ? instead of 0 it can be any characters

`/(pattern)(.*?)\1/` Use this and get the second captured group value. No need of lookbehind. — Tushar, Feb 17 '16 at 13:37
If you need full matches, try [(?:.?(?<=1)....1)+](https://regex101.com/r/rN2uV2/1) — bobble bubble, Feb 17 '16 at 13:55

score 12 · Accepted Answer · edited Jun 20 '20 at 09:12

12

A common trick is to use capturing technique inside an unanchored positive lookahead. Use this regex with preg_match_all:

(?=(1....1))

See regex demo

The values are in $matches[1]:

$re = "/(?=(1....1))/"; 
$str = "001110000100001100001"; 
preg_match_all($re, $str, $matches);
print_r($matches[1]);

See lookahead reference:

Lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called "assertions". They do not consume characters in the string, but only assert whether a match is possible or not.

If you want to store the match of the regex inside a lookahead, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)).

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 17 '16 at 13:38

Wiktor Stribiżew

607,720
39
448
563

1

thank you so much, i´ve already been playing arround with positive lookahead but didnt do right, this is working perfect and well explained – john Smith Feb 17 '16 at 14:10
This is genius. One of the most clever things I've ever seen with a regex. Thanks! Saved me big-time. – HartleySan May 31 '19 at 21:49

score 1 · Answer 2 · answered Apr 09 '21 at 17:50

You can also do it using the \K feature (that refers to where the returned result begins) inside a lookbehind:

(?<=\K1)....1

demo

This way, you don't need to create a capture group, and since all characters are consumed (except the first that is in the lookbehind), the regex engine doesn't have to retry the pattern for the next five positions after a success.

$str = '001110000100001100001';

preg_match_all('~ (?<= \K 1 ) .... 1 ~x', $str, $matches);

print_r($matches[0]);

code

Note that if you are sure the second character is always a zero, using 0(?<=\K10)...1 is more performant because the pattern starts with a literal character and pcre is able to optimize it with a quick search of possible positions in the subject string.

PCRE regular expression overlapping matches

2 Answers2

Linked

Related