regular expression match ending III or II or I (php)

Question

I tried to use regular expression (php) to match ending Roman Numerals. For simplicity, consider example below:

$str="Olympic III";
preg_match("#^(.*)(III|II|I)$#",$str,$rep);
print_r($rep);

That will only matches a single "I". The correct answer is for me to use ungreedy "U" modifier. But why? Doesn't regular expression use the order I provided (try "III" first before try "II" or "I")?

score 1 · Answer 1 · answered Dec 15 '19 at 07:17

Let us first understand what the \U is doing. It makes the quantifiers (in your case, the * in the first capturing group) lazy by default.

Your regex is equivalent to (.*?)(III|II|I) without the Ungreedy flag, which matches as you would expect it to.

With (.*)(III|II|I) what you are actually asking the regex engine is to use quantifiers greedily, i.e., match whatever they can for as long as they can. Since your alternation allows to accept either III, II, or I, the first capturing group, since it is acting greedily, consumes up the most, and leaves the smallest part for the second group which contains the alternation.

Mehdi Daalvand · Answer 2 · 2019-12-15T07:10:30.723

0

.* match the most character before (III|II|I) and (III|II|I) can only match one character you can use this regex sample ^(.*)\s(I+)$

edited Dec 15 '19 at 07:10

answered Dec 15 '19 at 07:05

Mehdi Daalvand

631
7
14

Scalway · Answer 3 · 2019-12-15T07:17:59.057

0

Try this:

$str="Olympic III";
preg_match("#^(.*)\s(I+)$#",$str,$rep);
print_r($rep);

PHP Sandbox

\s before (I+) or (III|II|I) matches single whitespace and it solves your problem because it forces regexp to match (.*) only to start of interesting part.

edited Dec 15 '19 at 07:17

answered Dec 15 '19 at 07:07

Scalway

1,633
10
18

regular expression match ending III or II or I (php)

3 Answers3