1

Trying to find all possible matches for digits separated by space. Already trying construction with lookahead and lookbehind, but it's not help.

We have string of separated digits (0..99).

We need to find all digital sequences which length greater than 3 and may contain special symbol, for example '1'.

Symbol '1' is universal symbol and can replace any of digits. For example, '2 2 2 2', '1 2 1 2', '2 1 2 2', '2 2 2 1' - is valid matches.

All is ok, except such situations, where symbol '1' owned by two matches:

... 2 2 2 2 1 3 3 3 3 ...

Raw string:

3 1 1 4 4 4 4 1 4 4 5 5 1 5 5 6 7 8 1 9 9 9 9 9 9 1 3 4 5 5 1 1 5 5 6 4 4 4 1 7 1 2 2 2 2 2 2 2 1 11 11 11 11 

My regex do almost all fine, except last match:

/(1 ){0,}(\d |\d\d )\2{1,}(1 ){0,}\2{1,}(1 ){0,}/g

Current result (see last match, it must be '1 11 11 11 11'):

1 1 4 4 4 4 1 4 4 
5 5 1 5 5 
1 9 9 9 9 9 9 1 
5 5 1 1 5 5 
4 4 4 1 
1 2 2 2 2 2 2 2 1 
11 11 11 11 

Goal:

1 1 4 4 4 4 1 4 4 
5 5 1 5 5 
1 9 9 9 9 9 9 1 
5 5 1 1 5 5 
4 4 4 1 
1 2 2 2 2 2 2 2 1 
1 11 11 11 11 

Regexp /(?=((1 ){0,}(\d |\d\d )\3{1,}(1 ){0,}\3{1,}(1 ){0,}))/g give to much variations of overlapping.

Here sandbox: https://regex101.com/r/VDa6LZ/2

How to find all overlapped matches properly?

SlowSuperman
  • 488
  • 1
  • 8
  • 14
  • Well, the issue is that the first and last optional subpatterns are identical. Probably you could use a regex like `/(?:1 )*(\d |\d\d )\1+(1 )*\1+(?=(1 )*)/` and when you get the results with `preg_match_all`, check if Group 1 matched, and if yes, combine the whole match and the Group 1 value. – Wiktor Stribiżew Oct 01 '19 at 16:12
  • Partial answer with lookaheads: `(?=((?:(?<!\d)1 )*(\d+ )(?:\2|(?:1 )){2,}))` Be aware, though - this will properly allow overlapping matches, but will match many more valid sequences than your example. – Nick Reed Oct 01 '19 at 17:30
  • 2
    I came up with [this pattern](https://regex101.com/r/VDa6LZ/5) which is hard to read and I don't have the time for explanation at the moment. The idea is to use a branch reset group for alternating between the overlapping `1 ` but consume only part until the last `1 ` sequence and capture non overlapping without lookahead. The results will be in **group 1**. – bobble bubble Oct 01 '19 at 18:27
  • Thank you, guys! @bobblebubble your solution `\b(?|(?=(((?:1 )+(\d?\d )\3+(?:1 )*\3+)(?:1 )*))\2|((\d?\d )\2+(?:1 )*\2+(?:1 )*))` is super! This is what I looking for! Thank you!! – SlowSuperman Oct 02 '19 at 07:22
  • 1
    Does this answer your question? [How can I match overlapping strings with regex?](https://stackoverflow.com/questions/20833295/how-can-i-match-overlapping-strings-with-regex) – Basti May 04 '21 at 20:10
  • Related canonical [How can I match overlapping strings with regex?](https://stackoverflow.com/q/20833295/2943403) – mickmackusa May 15 '23 at 07:20

1 Answers1

0

It seems that the solution proposed by @bobblebubble is found. Not too simple, but solves the problem.

\b(?|(?=(((?:1 )+(\d?\d )\3+(?:1 )*\3+)(?:1 )*))\2|((\d?\d )\2+(?:1 )*\2+(?:1 )*))

The results will be in group 1

  • (?| By use of the branch reset, outer groups on either side of pipe will be captured as group 1.
  • (?=(((?:1 )+(\d?\d )\3+(?:1 )*\3+)(?:1 )*))\2 the left side of the alternation is for capturing the overlapping 1 parts inside a lookahead. Group 2 is only for consuming the parts until where the last occurence of 1 starts as we don't want all the subsequences.
  • ((\d?\d )\2+(?:1 )*\2+(?:1 )*) the right side is for capturing the remaining sequences.
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
SlowSuperman
  • 488
  • 1
  • 8
  • 14