2

I have this string:

$string = 'Hello IV WorldX';

And I want to replace all roman numerals to integers.

I have the following function to convert roman to integer:

function roman2number($roman){
    $conv = array(
        array("letter" => 'I', "number" => 1),
        array("letter" => 'V', "number" => 5),
        array("letter" => 'X', "number" => 10),
        array("letter" => 'L', "number" => 50),
        array("letter" => 'C', "number" => 100),
        array("letter" => 'D', "number" => 500),
        array("letter" => 'M', "number" => 1000),
        array("letter" => 0, "number" => 0)
    );
    $arabic = 0;
    $state = 0;
    $sidx = 0;
    $len = strlen($roman);

    while ($len >= 0) {
        $i = 0;
        $sidx = $len;
        while ($conv[$i]['number'] > 0) {
            if (strtoupper(@$roman[$sidx]) == $conv[$i]['letter']) {
                if ($state > $conv[$i]['number']) {
                    $arabic -= $conv[$i]['number'];
                } else {
                    $arabic += $conv[$i]['number'];
                    $state = $conv[$i]['number'];
                }
            }
            $i++;
        }
        $len--;
    }
    return($arabic);
}

echo roman2number('IV');

Works great (try it on ideone). How do I search & replace through the string to replace all instances of roman numerals. Something like:

$string = romans_to_numbers_in_string($string);

Sounds like regex needs to come to the rescue... or?

Henrik Petterson
  • 6,862
  • 20
  • 71
  • 155
  • Your intention's unclear. Are you looking for a regex version of your current approach? – revo Jul 19 '18 at 10:25
  • 1
    @revo No. The php function above converts a single instance of a roman numeral into a number like `roman2number('IV')`... I want to convert **all** roman numerals in a text string, like `romans_to_numbers_in_string('hello IV what X');` meaning I don't have a way to detect which specific characters in a string are roman numerals and how to convert only those parts of the string. Does this makes sense? – Henrik Petterson Jul 19 '18 at 10:27
  • Try `$str = preg_replace_callback('~\b[IVXLCDM0]+\b~', function($m) { return roman2number($m[0]); }, $str);`. See live demo here https://3v4l.org/Q1IHC – revo Jul 19 '18 at 10:29
  • I wonder if you intend to convert `I` in `I know`, it will be converted to `1`, right? You might need to first tag the input with some morpho-syntactic analyzer and only replace what should be replaced. Also, there is already a good post on matching [Roman numerals in text](https://stackoverflow.com/questions/267399/how-do-you-match-only-valid-roman-numerals-with-a-regular-expression), just use word boundaries, `~\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b~` – Wiktor Stribiżew Jul 19 '18 at 10:32
  • @revo Seems to be an issue with that regex. Please see https://ideone.com/oQoh40 Also, note that it should be space between the roman numerals to other terms. So the `X` in `HelloX` should not match, but `Hello X` should. – Henrik Petterson Jul 19 '18 at 10:33
  • Did you see the live demo I provided? Also notice the comment above yours. – revo Jul 19 '18 at 10:34
  • @WiktorStribiżew *Super* true. Converting roman numerals to numbers is essentially a last resort. I can't see a logical way to determining whether if `I know` and `Volume I` should be converted, if you get my point... without directly inputting connecting terms (like if a roman numeral comes after the term `Volume`, then convert it to int)... – Henrik Petterson Jul 19 '18 at 10:37
  • @revo Yes, can you please post that as a suggested answer. Thanks. – Henrik Petterson Jul 19 '18 at 10:37

1 Answers1

1

Here's a simple regex to match roman numerals:

\b[0IVXLCDM]+\b

So, you can implement romans_to_numbers_in_string like this:

function romans_to_numbers_in_string($string) {
    return preg_replace_callback('/\b[0IVXLCDM]+\b/', function($m) {
           return roman2number($m[0]);
           },$string);
}

There are some problems with this regex. Like, if you have a string like this:

I like roman numerals

It will become:

1 like roman numerals

Depending on your requirements, you can let it be, or you can modify the regex so that it doesn't convert single I's to numbers.

Wololo
  • 841
  • 8
  • 20