I want to catch roman numbers inside string (numbers below 80 is fine enough). I found good base for it in How do you match only valid roman numerals with a regular expression?. Problem is: it deals with whole strings. I did not found yet a solution how to detect roman numbers inside string, because there is nothing mandatory, every group may be optional. So far i tried something like this:
my $x = ' some text I-LXIII iv more ';
if ( $x =~ s/\b(
(
(XC|XL|L?X{0,3}) # first group 10-90
|
(IX|IV|V?I{0,3}) # second group 1-9
)+
)
\b/>$1</xgi ) { # mark every occurrence
say $x;
}
__END__
><some>< ><text>< ><>I<><-><>LXIII<>< ><>iv<>< ><more><
desired output:
some text >I<->LXIII< >iv< more
So, this one captures word boundaries by themself too, because all groups are optional. How to get it done? How to make one of those 2 groups mandatory while there is no possible to tell which one is mandatory? Other approaches to catch romans are welcome too.