I have a large docx
file that has the below interspersed throughout:
PART I
PART II
PART III
PART IIIA # part 3, section A
PART IV
PART V
PART VI
PART VII
I'm trying to write a regex in python that will pull these out, with re.match
, re.findall
, or re.search
. I can't figure out the correct regex syntax to only pull the above out - and nothing on SO or anywhere gives an example of correctly pulling out roman numerals.
Many examples on how to convert / validate, but nothing on simple regex matching. I was going off of this:
[PART].*\s[I]|[II]|[III]|[IIIA]|[IV]|[V]|[VI]|[VII]
or
[PART].*\s(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$
But that doesn't work - I'm messing up the "or" part. Best other SO article I could find