One easy way to do this is to use re.sub
with a callback function. The callback handles more complicated logic beyond simple substitution. In your case, you need to match all lowercase i
s following capital I
s, figure out how many i
's there are, and replace accordingly.
>>> re.sub('(?<=I)(i+)', lambda x: 'I' * len(x.group()), 'Part Iii, Work Principles')
'Part III, Work Principles'
The callback is not invoked (i.e., no replacement occurs) if there was no match.
If you're interested in a deeper understanding of what happens, here's the same callback as a function, with a couple of print statements.
>>> def replace(m):
... print(*[m, m.group(), len(m.group())], sep='\n')
... return 'I' * len(m.group())
...
>>> re.sub('(?<=I)(i+)', replace, 'Part Iii, Work Principles')
<_sre.SRE_Match object; span=(6, 8), match='ii'>
ii
2
'Part III, Work Principles'
You'll notice this prints out...
<_sre.SRE_Match object; span=(6, 8), match='ii'>
ii
2
...In addition to performing the replacement. The important thing to note is that it passes a match
object to the callback function. You can then figure out what was matched, and decide what to replace it with accordingly.
Generalising to Arbitrary Roman Numerals
If your function has to match any roman numerals, then you can pass a pattern that finds those to re.sub
, but your callback simplifies greatly:
>>> p = r'\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b'
>>> string = 'Part viiI, Work Principles'
>>> re.sub(p, lambda x: x.group().upper(), string, flags=re.IGNORECASE)
'Part VIII, Work Principles'
Now, all you need to do is uppercase the matched string.