1

I have the following regexp:

/xxx ([a-z]+)(?:, ([a-z]+))* xxx/

I want to capture all colors in the following test string:

xxx red, blue, pink, purple xxx

(now only red and purple get captured)

open this url and see the matched groups: http://www.regex101.com/r/oZ2cH4

I have read the following http://www.regular-expressions.info/captureall.html but the trick didn't work

(or maybe I did it wrong)

how can I resolve this?

thank you in advance

  • 2
    Match against `/xxx ([a-z]+(?:, [a-z]+)*) xxx/`, take the first capturing group, then split on `, ` – nhahtdh May 12 '13 at 05:01
  • 1
    This answer demonstrate the idea http://stackoverflow.com/a/15922245/1400768. The other way with regex alone is very monstrous and I wouldn't recommend it - it is only for educational purpose: http://stackoverflow.com/a/15418942/1400768 – nhahtdh May 12 '13 at 05:05

2 Answers2

3

You probably want to return a matching group based on a previous pattern matches:

$word = '[a-z]+';
$sep  = '[, ]+';

$words = $captures("~($word)(?:{$sep})?~");
$of    = $captures("~xxx ({$word}(?:{$sep}{$word})*) xxx~");

print_r($words($of($subject)));

Output:

Array
(
    [0] => red
    [1] => blue
    [2] => pink
    [3] => purple
)

Whereas $captures is a function that return a pre-configured preg_match_all call allowing to process not only a string as subject but anything foreach can operate on:

$captures = function ($pattern, $group = 1) {
    return function ($subject) use ($pattern, $group) {
        if (is_string($subject)) {
            $subject = (array)$subject;
        }
        $captures = [];
        foreach ($subject as $step) {
            preg_match_all($pattern, $step, $matches);
            $captures = array_merge($captures, $matches[$group]);
        }
        return $captures;
    };
};

By default and as used in the example above, it returns the first group (1), but this can be configured.

This allows to first match the outer pattern ($of) and then on each of those matches the inner pattern ($words). The example in full:

$subject = '/xxx red, blue, pink, purple xxx/';

$captures = function ($pattern, $group = 1) {
    return function ($subject) use ($pattern, $group) {
        if (is_string($subject)) {
            $subject = (array)$subject;
        }
        $captures = [];
        foreach ($subject as $step) {
            preg_match_all($pattern, $step, $matches);
            $captures = array_merge($captures, $matches[$group]);
        }
        return $captures;
    };
};

$word = '[a-z]+';
$sep  = '[, ]+';
$seq  = "";

$words = $captures("~($word)(?:{$sep})?~");
$of    = $captures("~xxx ({$word}(?:{$sep}{$word})*) xxx~");

print_r($words($of($subject)));

See the live-demo.

hakre
  • 193,403
  • 52
  • 435
  • 836
0

The tutorial "Repeating a Capturing Group vs. Capturing a Repeated Group" (by regular-expressions.info) describes how you would capture all of the content "red, blue, pink, purple" in a single capture. The pattern it would suggest is

/xxx ((?:[a-z]+(?:, )?)+) xxx/

but if this were really what you were trying to accomplish, you may as well use the simpler expression

/xxx ([a-z, ]*) xxx/

I suspect what you actually want is to capture each color individually. This might be best accomplished by capturing the entire list once, then parsing that captured content.

hakre
  • 193,403
  • 52
  • 435
  • 836
Clark
  • 890
  • 8
  • 20
  • 2
    It is a bad idea to repeat the separator along with the tokens `/xxx ((?:[a-z]+(?:, )?)+) xxx/`. If you are not careful (like in this case), you will induce backtracking hell on invalid input: http://www.regex101.com/r/bS5mG3 – nhahtdh May 12 '13 at 05:11