3

I'd like to do something similar to question preg_match_all how to get *all* combinations? Even overlapping ones and find all matches for a given pattern even when they overlap (e.g. matching string ABABA with pattern ABA should return 2 matches, not just the first one).

But I have an additional constraint: my pattern can end with a repetition specifier. Let's use + as an example: this means pattern /A+/ and subject "AA" should return 3 matches:

  • Match "AA" starting at index 0
  • Match "A" starting at index 1
  • Match "A" starting at index 0

Following patterns, based on the solution suggested to the question above, fail to match all 3 results:

  • Pattern /(?=(A+))/ finds only the first 2 matches but not the last one
  • Pattern /(?=(A+?))/ finds only the last 2 matches but not the first one

My only workaround for now is to keep the greedy version and try to apply pattern against each match minus its last character, repeating this operation until it doesn't match anymore, e.g.:

$all_matches = array ();
$pattern = 'A+';

preg_match_all("/(?=($pattern))/", "AA", $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    do {
        $all_matches[] = $match[1];
        $subject = substr($match[1], 0, -1);
    }
    while (preg_match("/^($pattern)/", $subject, $match));
}

Is there any better solution to achieve this using preg_match_all or similar?

Community
  • 1
  • 1
r3c
  • 498
  • 3
  • 8
  • You want to get several matches at one index, which is impossible with 1 regex matching operation. You actually do not want a pure regex solution. You need to 1) find all combination of substrings from your string and 2) only keep those that fully match your pattern. – Wiktor Stribiżew Nov 20 '16 at 18:39

1 Answers1

1

You want to get several matches at one index, which is impossible with 1 regex matching operation. You actually need to

  • Find all combination of substrings from your string and
  • Only keep those that fully match your pattern.

See the PHP demo:

function find_substrings($r, $s) {
  $res = array();
  $cur = "";
  $r = '~^' . $r . '$~';
  for ($q = 0; $q < strlen($s); ++$q) {
    for ($w = $q; $w <= strlen($s); ++$w) {
        $cur = substr($s, $q, $w-$q);
        if (preg_match($r, $cur)) {
            array_push($res, $cur);
        }
    }
  }
  return $res;
}
print_r(find_substrings("ABA", "ABABA"));
// => Array ( [0] => ABA [1] => ABA )
print_r(find_substrings("A+", "AA"));
// => Array ( [0] => A [1] => AA [2] => A )
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks for your answer and your clear rephrasing of the problem! It seems using `preg_match_all` and the zero-length lookahead trick gives the function a little boost, but result is equivalent. I'll do a bit more search and probably end up using it :) – r3c Nov 20 '16 at 23:09