Use
def clean(text):
pattern = r"\b(?=[MDCLXVIΙ])M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})([IΙ]X|[IΙ]V|V?[IΙ]{0,3})\b\.?"
return re.sub(pattern, '&', text)
See regex proof. Add more non-standard letters like Ι
if necessary.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[MDCLXVIΙ] any character of: 'M', 'D', 'C', 'L',
'X', 'V', 'I', '&', '#', '9', '2', '1',
';'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
M{0,4} 'M' (between 0 and 4 times (matching the
most amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
CM 'CM'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
CD 'CD'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
D? 'D' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
C{0,3} 'C' (between 0 and 3 times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
XC 'XC'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
XL 'XL'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
L? 'L' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
X{0,3} 'X' (between 0 and 3 times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[IΙ] any character of: 'I', '&', '#', '9',
'2', '1', ';'
--------------------------------------------------------------------------------
X 'X'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[IΙ] any character of: 'I', '&', '#', '9',
'2', '1', ';'
--------------------------------------------------------------------------------
V 'V'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
V? 'V' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
[IΙ]{0,3} any character of: 'I', '&', '#', '9',
'2', '1', ';' (between 0 and 3 times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))