2

I have the following regexp:

/(?:[\[\{]*)(?:([A-G\-][^A-G\]\}]*)+)(?:[\]\}]*)/

with the following expression:

{A''BsCb}

I expect 3 matched results

A''
Bs
Cb

but testing at https://regex101.com/ only gives me the last match Cb, and tells me that a repeated capturing group will only capture the last iteration, put a capturing group around the repeated group.

I thought that was what I had done! I thought I'd understood the problem as described here http://www.regular-expressions.info/captureall.html Hence the brackets outside my + with the capturing group inside.

But either it's getting too late or I need someone who's head doesn't implode at the mention of regexp to show me where I've gone wrong.

AntG
  • 1,291
  • 2
  • 19
  • 29
  • 1
    Except for Dot-Net engine, all others _overwrite_ capture groups in each quantified pass. So, in `( [A-G\-] [^A-G\]\}]* )+` that group will only contain what was captured on the last pass via the quantifier. You need another way to do this. You can do it in two steps easily via callback. –  Aug 25 '16 at 22:12

3 Answers3

2

You can get it with this pattern with preg_match_all at the item 0:

~
(?:
    \G (?!\A) # contiguous to previous match, but not at the start of the string
  |
    { (?=[^}]* }) # start with { and check if a closing bracket follows 
  |
    \[ (?=[^]]* ]) # the same for square bracket
)
\K # start the match result here
[A-G] [^]A-G}]* 
~xS

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • I like how you check once down stream when an open brace is first matched. I've seen some users of \G checking it each time at the end. –  Aug 25 '16 at 22:36
  • @sln: yes, sometimes it's appropriate to do that, and sometimes it isn't (When you choose that an opening bracket stays opened until there's a closing bracket, and if the missing closing bracket doesn't matter.) – Casimir et Hippolyte Aug 25 '16 at 22:39
1

You are trying to match repeated capturing groups and get the captures. It is not possible with PHP PCRE regex.

What you can do is to make sure you either extract all {...} / [...] substrings, trim them from the brackets and use a simple [A-G-][^A-G]* regex, or add a \G operator and make your regex unmaintainable but working as the original one.

Solution 1 is

/(?:[[{]*|(?!\A)\G)\K[A-G-][^A-G\]}]*/

See the regex demo. Note: this regex does not check for the closing ] or }, but it can be added with a positive lookahead.

  • (?:[[{]*|(?!\A)\G) - matches a [ or {, zero or more occurreces, or the end location of the previous successful match
  • \K - omits the text matched so far
  • [A-G-] - letters from A to G and a -
  • [^A-G\]}]*- zero or more chars other than A to G and other than ] and }.

See PHP demo.

Solution 2 is

$re = '/(?|{([^}]*)}|\[([^]]*)])/'; 
$str = "{A''BsCb}"; 
$res = array();
preg_match_all($re, $str, $m);
foreach ($m[1] as $match) {
    preg_match_all('~[A-G-][^A-G]*~', $match, $tmp);
    $res = array_merge($tmp, $res);
}
print_r($res);

See the PHP demo

The (?|{([^}]*)}|\[([^]]*)]) regex just matches strings like {...} or [...] (but not {...] or [...}) and captures the contents between brackets into Group 1 (since the branch reset group (?|...) resets the group IDs in each branch). Then, all we need is to grab what we need with a more coherent '~[A-G-][^A-G]*~' regex.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You already figured it out. Regarding to @sln's comment, there is no way to gather each singular match in one or different capturing groups while repeating a group in PCRE which is PHP's regex flavor. In this case only the last match is captured.

However if asserting that braces should be there at the start and end of string is not important and you only need those values there is less work to do:

$array = array_filter(preg_split("~(?=[A-G])~", trim("{A''BsCb}", '[{}]')));

Regex:

(?=[A-G]) # Positive lookahead to find next character be one from character class

This regex will match all similar positions to output correct data on split:

array(3) {
  [1]=>
  string(3) "A''"
  [2]=>
  string(2) "Bs"
  [3]=>
  string(2) "Cb"
}

Live demo

revo
  • 47,783
  • 14
  • 74
  • 117