If you want to support any Roman numbers you can use
^(\S+(?:.*\b(?=[MDCLXVI])M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})\b(?= +\S))?) +(.*)
If you need to support Roman numbers up to XX (exclusive):
^(\S+(?:.*\b(?=[XVI])X?(?:IX|IV|V?I{0,3})\b(?= +\S))?) +(.*)
See the regex demo #1 and demo #2. Replace spaces with \h
or \s
in the Java code and double backslashes in the Java string literal.
Details:
^
- start of string
(
- Group 1 start:
\S+
- one or more non-whitespaces
(?:
- a non-capturing group:
.*
- any zero or more chars other than line break chars as many as possible
\b
- a word boundary
(?=[MDCLXVI])
- require at least one Roman digit immediately to the right
M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})
- a Roman number pattern
\b
- a word boundary
(?= +\S)
- a positive lookahead that requires one or more spaces and then one non-whitespace right after the current position
)?
- end of the non-capturing group, repeat one or zero times (it is optional)
)
- end of the first group
+
- one or more spaces
(.*)
- Group 2: the rest of the line.
In Java:
String regex = "^(\\S+(?:.*\\b(?=[MDCLXVI])M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})\\b(?=\\h+\\S))?)\\h+(.*)";
// Or
String regex = "^(\\S+(?:.*\\b(?=[XVI])X?(?:IX|IV|V?I{0,3})\\b(?=\\s+\S))?)\\s+(.*)";