3

I have the following string:

str = "MMX Lions Television Inc"

And I need to convert it into:

conv_str = "2010 Lions Television Inc"

I have the following function to convert a roman numeral into its integer equivalent:

numeral_map = zip(
    (1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1),
    ('M', 'CM', 'D', 'CD', 'C', 'XC', 'L', 'XL', 'X', 'IX', 'V', 'IV', 'I')
)

def roman_to_int(n):
    n = unicode(n).upper()

    i = result = 0
    for integer, numeral in numeral_map:
        while n[i:i + len(numeral)] == numeral:
            result += integer
            i += len(numeral)
    return result

How would I use re.sub to do the get the correct string here?

(Note: I tried using the regex described here: How do you match only valid roman numerals with a regular expression? but it was not working.)

Community
  • 1
  • 1
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Is there any reason you're not using a straight-up dictionary for holding your roman numerals, and then using the keys to get the value? – Makoto Apr 10 '12 at 17:38
  • 2
    @Makoto: Yes, because the order in which the numerals are extracted is relevant. `1000` must be `M` - it can't be `DD` or `CCCCCCCCCC` which you'd get if you used a dictionary. At least for conversions from decimal to roman numerals, you need the fixed order of numerals. – Tim Pietzcker Apr 10 '12 at 17:45

2 Answers2

7

Always try the Python Package Index when looking for a common function/library.

This is the list of modules related to the keyword 'roman'.

For example 'romanclass' has a class that implement the conversion, quoting the documentation:

So a programmer can say:

>>> import romanclass as roman

>>> two = roman.Roman(2)

>>> five = roman.Roman('V')

>>> print (two+five)

and the computer will print:

VII
KurzedMetal
  • 12,540
  • 6
  • 39
  • 65
  • Thanks, and how would this be applied to the problem above? – David542 Apr 10 '12 at 17:55
  • 1
    At a guess: extract the Roman numeral from the string using regex (as per the other answer you linked in the OP), then use this module to convert your Roman numeral to a number. Use regex for what it's good at (finding strings) and this Roman module for what it's good at (converting strings to numbers) and you will have a robust solution. – Li-aung Yip Apr 10 '12 at 18:15
2

re.sub() can accept a function as the replacement, the function will receive a single argument which is the Match object, and should return a replacement string. You already have a function to convert a Roman numeral string to an int so this won't be difficult.

In your case you would want a function like this:

def roman_to_int_repl(match):
    return str(roman_to_int(match.group(0)))

Now you can modify the regex from the question you linked so that it will find matches within a larger string:

s = "MMX Lions Television Inc"
regex = re.compile(r'\b(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b')
print regex.sub(roman_to_int_repl, s)

Here is a version of the regex that would not replace "LLC" in a string:

regex = re.compile(r'\b(?!LLC)(?=[MDCLXVI]+\b)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b')

You could also use the original regex with a modified replacement function:

def roman_to_int_repl(match):
    exclude = set(["LLC"])   # add any other strings you don't want to replace
    if match.group(0) in exclude:
        return match.group(0)
    return str(roman_to_int(match.group(0)))
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • Thanks, this works great. How would you also get the `re` to ignore "LLC"? – David542 Apr 10 '12 at 18:05
  • At the beginning of the regex, add the following `(?!LLC\b)`, if is a larger list that you want to disallow, you can use something like the following: `(?!(LLC|XXX|I)\b)` – Andrew Clark Apr 10 '12 at 18:08