1

I have this PHP regular Expression:

https?://(?:[a-z0-9]+\.)?livestream\.com/(?:(accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?)|[^\s/]+/video\?clipId=([^\s&]+)|([^\s/]+))

I like to match the following URLs with the the results.

http://original.livestream.com/bethanychurchnh = bethanychurchnh

http://original.livestream.com/bethanychurchnh/video?clipId=flv_b54a694b-043c-4886-9f35-03c8008c23 = flv_b54a694b-043c-4886-9f35-03c8008c23

http://livestream.com/accounts/142499/events/3959775 = accounts/142499/events/3959775

http://livestream.com/accounts/142499/events/3959775/videos/83958146 = /accounts/142499/events/3959775/videos/83958146

It works fine but I have this problem that the capture groups are 2nd and 3rd for some of the matches. I like to have the captured string always be matched as the first capture group. Is this possible?

redanimalwar
  • 1,229
  • 1
  • 12
  • 32
  • Duplicate of [Regex before or after](https://stackoverflow.com/questions/23162462/regex-before-or-after/23162886#23162886), [PHP Regex, ignore first grouping in a Alternating statement](https://stackoverflow.com/questions/5332881/php-regex-ignore-first-grouping-in-a-alternating-statement) – bobble bubble Jul 05 '22 at 09:44

2 Answers2

3

You can use a branch reset in your regex:

https?:\/\/(?:[a-z0-9]+\.)?livestream\.com\/(?|(accounts\/[0-9]+\/events\/[0-9]+(?:\/videos\/[0-9]+)?)|[^\s\/]+\/video\?clipId=([^\s&]+)|([^\s\/]+))
                                             ^^

See regex demo

See description of branch reset at regular-expressions.info:

Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don't use any alternation or capturing groups inside the branch reset group, then its special function doesn't come into play. It then acts as a non-capturing group.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

Other possibility, you can allow duplicate named captures with (?J)

$pattern = '~(?J)https?://(?:[a-z0-9]+\.)?livestream\.com/
(?:
    (?<id>accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?)
  |
    [^\s/]+/video\?clipId=(?<id>[^\s&]+)
  |
    (?<id>[^\s/]+)
)~x';

if (preg_match($pattern, $text, $m))
    echo $m['id'];

demo

Or since what you are looking for is always at the end of the pattern, you don't need a capture group at all with the \K feature that removes all on the left from the whole match result:

$pattern = '~https?://(?:[a-z0-9]+\.)?livestream\.com/ \K
(?:
    accounts/[0-9]+/events/[0-9]+(?:/videos/[0-9]+)?
  |
    [^\s/]+(?:/video\?clipId=\K[^\s&]+)?
)~x';

if (preg_match($pattern, $text, $m))
    echo $m[0];
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125