1

I tried to use regular expression (php) to match ending Roman Numerals. For simplicity, consider example below:

$str="Olympic III";
preg_match("#^(.*)(III|II|I)$#",$str,$rep);
print_r($rep);

That will only matches a single "I". The correct answer is for me to use ungreedy "U" modifier. But why? Doesn't regular expression use the order I provided (try "III" first before try "II" or "I")?

CinCout
  • 9,486
  • 12
  • 49
  • 67
Curry T
  • 9
  • 2

3 Answers3

1

Let us first understand what the \U is doing. It makes the quantifiers (in your case, the * in the first capturing group) lazy by default.

Your regex is equivalent to (.*?)(III|II|I) without the Ungreedy flag, which matches as you would expect it to.

With (.*)(III|II|I) what you are actually asking the regex engine is to use quantifiers greedily, i.e., match whatever they can for as long as they can. Since your alternation allows to accept either III, II, or I, the first capturing group, since it is acting greedily, consumes up the most, and leaves the smallest part for the second group which contains the alternation.

CinCout
  • 9,486
  • 12
  • 49
  • 67
0

.* match the most character before (III|II|I) and (III|II|I) can only match one character you can use this regex sample ^(.*)\s(I+)$

Mehdi Daalvand
  • 631
  • 7
  • 14
0

Try this:

$str="Olympic III";
preg_match("#^(.*)\s(I+)$#",$str,$rep);
print_r($rep);

PHP Sandbox

\s before (I+) or (III|II|I) matches single whitespace and it solves your problem because it forces regexp to match (.*) only to start of interesting part.

Scalway
  • 1,633
  • 10
  • 18