^(?:.(?! (?=[MDCLXVI])(M*)(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$))+\S?
This regular expression should work, but it might be a bit of an overkill for your use case because it checks for all possible Roman numerals following modern strict notation, including very large numbers in the range of thousands. It handles names or surnames written in capital letters that satisfy the syntax of a Roman numeral correcly, unless they appear at the very end (eg. "Jet LI") in which case they will be processed as a Roman numeral.
This was my logic:
-
Lets match start of string, followed by one or more instances of
<any character not followed by space + roman numeral + end>
plus possibly one more non-space characters (the last letter of surname, which may be followed by space+roman numeral+end).
^(?:<any non-linebreak character not followed by space + Roman numeral + end>)+\S?
-
<any non-linebreak character not followed by space + Roman numeral + end>
is matched using this regex:
.(?! <Roman numeral>$)
-
And a
<Roman numeral>
in modern strict notation can be matches like this:
(?=[MDCLXVI])(M*)(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})
-
Now substitute everything together to get the final regex.
Note:
If you only want to consider Roman numerals in a certain range, update the <Roman numeral>
part accordingly. Eg. for numbers smaller than twenty it would become (?=[XVI])X?(I[XV]|V?I{0,3})
. The entire regex would than be:
^(?:.(?! (?=[XVI])X?(I[XV]|V?I{0,3})$))+\S?
Reference:
Roman Numerals
Update:
Here is another possible regex, which should be faster than the one above because it matches all non-spaces greedily and only checks the negative lookahead in case of spaces.
^(?:\S+| (?!(?=[IVXLCDM])(M*)(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$))+
The general logic here is:
^(?:\S+| (?!<Roman numeral>$))+