How do you match only valid roman numerals with a regular expression?

Question

Thinking about my other problem, i decided I can't even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them)

The problem is matching only valid roman numerals. Eg, 990 is NOT "XM", it's "CMXC"

My problem in making the regex for this is that in order to allow or not allow certain characters, I need to look back. Let's take thousands and hundreds, for example.

I can allow M{0,2}C?M (to allow for 900, 1000, 1900, 2000, 2900 and 3000). However, If the match is on CM, I can't allow following characters to be C or D (because I'm already at 900).

How can I express this in a regex?
If it's simply not expressible in a regex, is it expressible in a context-free grammar?

paxdiablo · Accepted Answer · 2020-02-07T04:37:21.667

374

You can use the following regex for this:

^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$

Breaking it down, M{0,4} specifies the thousands section and basically restrains it to between 0 and 4000. It's a relatively simple:

   0: <empty>  matched by M{0}
1000: M        matched by M{1}
2000: MM       matched by M{2}
3000: MMM      matched by M{3}
4000: MMMM     matched by M{4}

You could, of course, use something like M* to allow any number (including zero) of thousands, if you want to allow bigger numbers.

Next is (CM|CD|D?C{0,3}), slightly more complex, this is for the hundreds section and covers all the possibilities:

  0: <empty>  matched by D?C{0} (with D not there)
100: C        matched by D?C{1} (with D not there)
200: CC       matched by D?C{2} (with D not there)
300: CCC      matched by D?C{3} (with D not there)
400: CD       matched by CD
500: D        matched by D?C{0} (with D there)
600: DC       matched by D?C{1} (with D there)
700: DCC      matched by D?C{2} (with D there)
800: DCCC     matched by D?C{3} (with D there)
900: CM       matched by CM

Thirdly, (XC|XL|L?X{0,3}) follows the same rules as previous section but for the tens place:

 0: <empty>  matched by L?X{0} (with L not there)
10: X        matched by L?X{1} (with L not there)
20: XX       matched by L?X{2} (with L not there)
30: XXX      matched by L?X{3} (with L not there)
40: XL       matched by XL
50: L        matched by L?X{0} (with L there)
60: LX       matched by L?X{1} (with L there)
70: LXX      matched by L?X{2} (with L there)
80: LXXX     matched by L?X{3} (with L there)
90: XC       matched by XC

And, finally, (IX|IV|V?I{0,3}) is the units section, handling 0 through 9 and also similar to the previous two sections (Roman numerals, despite their seeming weirdness, follow some logical rules once you figure out what they are):

0: <empty>  matched by V?I{0} (with V not there)
1: I        matched by V?I{1} (with V not there)
2: II       matched by V?I{2} (with V not there)
3: III      matched by V?I{3} (with V not there)
4: IV       matched by IV
5: V        matched by V?I{0} (with V there)
6: VI       matched by V?I{1} (with V there)
7: VII      matched by V?I{2} (with V there)
8: VIII     matched by V?I{3} (with V there)
9: IX       matched by IX

Just keep in mind that that regex will also match an empty string. If you don't want this (and your regex engine is modern enough), you can use positive look-behind and look-ahead:

(?<=^)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})(?=$)

(the other alternative being to just check that the length is not zero beforehand).

edited Feb 07 '20 at 04:37

answered Nov 06 '08 at 01:35

paxdiablo

854,327
234
1,573
1,953

Possibly. I can see only inferences in the question that it needs to go up to 3999 (not a specific definite requirement) but I allow it up to 4999 anyway. If you truly want to restrict it to 3999 then by all means remove one of the Ms. – paxdiablo Mar 25 '10 at 02:41
3

any solution to avoid matching the empty string? – Facundo Casco Nov 01 '11 at 22:33
Yes, you can use one of those lookahead things if your regex engine supports it, or you can just check that the length is greater than zero. – paxdiablo Nov 01 '11 at 23:58
BTW 4000 is not MMMM, its IV(bar). see here if you are really interested: http://scienceray.com/mathematics/roman-numerals-3501-to-4000/ – Green goblin Jun 29 '12 at 11:19
11

@Aashish: When the Romans were a force to be reckoned with, `MMMM` was the correct way. The overbar representation came long after the core empire fell to pieces. – paxdiablo Jul 15 '13 at 02:18
@paxdiablo there is certain problem even if we make m{0,3} .. in that regex shall fail for MMMCM which is correct denotation for 3900 and as it is now it falsely validates 4000 :(. check this website it gives the accurate conversion, what regex are they using i wonder. http://bmanolov.free.fr/arabic2roman.php – amIT Aug 09 '14 at 19:40
Mmmcm won't fail, it uses the mmm from the first subsection and cm from the second. And it doesn't falsely validate 4000, it only goes up to x999 where x is dictated by the initial m-count. The question itself imposed no restrictions on the range other than an implied minimum. – paxdiablo Aug 09 '14 at 22:12
2

@paxdiablo this is how i found mmmcm fails. String regx = "^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$"; if(input.matches(regx)) -> this evalulates to false for MMMCM / MMMM in java. – amIT Aug 10 '14 at 14:07
Mmmm won't work with that regex since you only have 0,3 in your m section. Mmmcm should work fine, I vaguely remember checking all possibilities when I originally wrote the code. I'll check it when I get access to a box running Java, I'm currently in Vegas after attending DefCon and won't be back in Oz for another week. – paxdiablo Aug 10 '14 at 15:50
this would also match IIII ! what if I only want the traditionally accepted numerals – Bernardo Santana Apr 11 '16 at 14:37
Bernardo, it won't match IIII since the {0,3} clause prevents that. – paxdiablo Apr 11 '16 at 23:57
2

`/^M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})$/i` – Crissov Mar 29 '19 at 13:11
Given the modern world of Unicode, how can you rewrite this regex to match given the code range `[\x{2160}-\x{2188}]]` or `[ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾⅿↀↁↂↃↄↅↆↇↈ]` – Dec 12 '19 at 19:18
@x15, that sounds worthy of a *different* question! – paxdiablo Feb 12 '20 at 01:38
This answer doesn't work for input "MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMCDLXXVI" which is a valid representation for 53476. – Raghu Kumar Jun 04 '20 at 23:07
@RaghuKumar, you need to learn to *read.* It won't work because, as stated in the answer quite clearly, the `M` section is limited to 4000. If you want to handle ridiculously large numbers like that, simply change the `M{0,4}` into `M*`. – paxdiablo Jun 05 '20 at 10:40
Can we go from 2 lookarounds to 1 by just having a positive lookahead like this: `^(?=\w)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$` – Garrett Oct 28 '20 at 21:41

score 27 · Answer 2 · edited Nov 06 '08 at 04:09

27

Actually, your premise is flawed. 990 IS "XM", as well as "CMXC".

The Romans were far less concerned about the "rules" than your third grade teacher. As long as it added up, it was OK. Hence "IIII" was just as good as "IV" for 4. And "IIM" was completely cool for 998.

(If you have trouble dealing with that... Remember English spellings were not formalized until the 1700s. Until then, as long as the reader could figure it out, it was good enough).

edited Nov 06 '08 at 04:09

Jonathan Leffler

730,956
141
904
1,278

answered Nov 06 '08 at 01:51

James Curran

101,701
37
181
258

10

Sure, that's cool. But my "strict third grade teacher" syntax need makes a much more interesting regex problem, in my opinion... – Daniel Magliola Nov 06 '08 at 03:03
7

Good point James, one ought to be a strict author but a forgiving reader. – Corin May 04 '12 at 00:19
@Corin: aka [Postel's robustness principle](http://ironick.typepad.com/ironick/2005/05/my_history_of_t.html) – jfs May 09 '17 at 21:27

score 21 · Answer 3 · answered Apr 12 '16 at 14:33

21

Just to save it here:

(^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$)

Matches all the Roman numerals. Doesn't care about empty strings (requires at least one Roman numeral letter). Should work in PCRE, Perl, Python and Ruby.

Online Ruby demo: http://rubular.com/r/KLPR1zq3Hj

Online Conversion: http://www.onlineconversion.com/roman_numerals_advanced.htm

answered Apr 12 '16 at 14:33

smileart

1,598
1
15
17

2

I don't know why, but the main answer didn't work for me in autotranslate lists in MemoQ. However, this solution does - excluding string start/end symbols though. – orlando2bjr Apr 06 '17 at 12:31
1

@orlando2bjr glad to help. Yeah, in this case I was matching a number on its own, without surroundings. If you look for it in a text, sure you'd need to remove ^$. Cheers! – smileart May 06 '17 at 16:37
How would I make this match on anywhere in a block of text. This will only match if the line contains only chars for the numeral – Verty00 Oct 29 '20 at 13:44
@Verty00 See previous comment – smileart Nov 10 '20 at 00:53
1

Here's the same one with non-capture groups to clean up the result a little better. You can also use word boundaries `\b` and even another non-capture group on the outside if you want (`?:`) `(\b(?=[MDCLXVI])M*(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})\b)` – brandonscript Apr 01 '21 at 18:39

Corin · Answer 4 · 2012-05-04T02:00:34.810

13

To avoid matching the empty string you'll need to repeat the pattern four times and replace each 0 with a 1 in turn, and account for V, L and D:

(M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))

In this case (because this pattern uses ^ and $) you would be better off checking for empty lines first and don't bother matching them. If you are using word boundaries then you don't have a problem because there's no such thing as an empty word. (At least regex doesn't define one; don't start philosophising, I'm being pragmatic here!)

In my own particular (real world) case I needed match numerals at word endings and I found no other way around it. I needed to scrub off the footnote numbers from my plain text document, where text such as "the Red Sea^cl and the Great Barrier Reef^cli" had been converted to the Red Seacl and the Great Barrier Reefcli. But I still had problems with valid words like Tahiti and fantastic are scrubbed into Tahit and fantasti.

edited May 04 '12 at 02:00

answered May 04 '12 at 01:07

Corin

2,417
26
23

I have similar problem (!): to do a "left trim" of remaining/residual roman number of a item list (HTML OL of type I or i). So, when there are remaining, I need to clean (like a trim function) with your regex at the beginning (left) of the item-text... But more simple: items never use `M` or `C` or `L`, so, do you have this kind of simplified regex? – Peter Krauss Nov 11 '14 at 20:00
... ok, here it seems ok (!), `(X{1,3}(IX|IV|V?I{0,3})|X{0,3}(IX|I?V|V?I{1,3}))` – Peter Krauss Nov 11 '14 at 20:21
1

you don't need to repeat the pattern, to reject empty strings. You could [use a lookahead assertion](http://ideone.com/c9xfNS) – jfs May 10 '17 at 20:09
@jfs Some programs, like `sed`, do not support lookahead, so a “raw” solution like that one is very welcome as an alternative. – Alice M. Jun 29 '22 at 07:45

Jonathan Leffler · Answer 5 · 2008-11-07T07:03:54.110

8

Fortunately, the range of numbers is limited to 1..3999 or thereabouts. Therefore, you can build up the regex piece-meal.

<opt-thousands-part><opt-hundreds-part><opt-tens-part><opt-units-part>

Each of those parts will deal with the vagaries of Roman notation. For example, using Perl notation:

<opt-hundreds-part> = m/(CM|DC{0,3}|CD|C{1,3})?/;

Repeat and assemble.

Added: The <opt-hundreds-part> can be compressed further:

<opt-hundreds-part> = m/(C[MD]|D?C{0,3})/;

Since the 'D?C{0,3}' clause can match nothing, there's no need for the question mark. And, most likely, the parentheses should be the non-capturing type - in Perl:

<opt-hundreds-part> = m/(?:C[MD]|D?C{0,3})/;

Of course, it should all be case-insensitive, too.

You can also extend this to deal with the options mentioned by James Curran (to allow XM or IM for 990 or 999, and CCCC for 400, etc).

<opt-hundreds-part> = m/(?:[IXC][MD]|D?C{0,4})/;

edited Nov 07 '08 at 07:03

answered Nov 06 '08 at 01:36

Jonathan Leffler

730,956
141
904
1,278

Starting with `thousands hundreds tens units`, it is easy to [create a FSM that computes and *validates* given Roman numerals](http://stackoverflow.com/a/43884420/4279) – jfs May 10 '17 at 20:08
What do you mean by **Fortunately, the range of numbers is limited to 1..3999 or thereabouts**? Who limited it? – SexyBeast Sep 29 '17 at 14:32
@SexyBeast: There isn’t any standard Roman notation for 5,000, let alone bigger numbers, so the regularities that work up to then stop working. – Jonathan Leffler Sep 29 '17 at 15:09
1

Not sure why you believe that, but Roman numerals can represent numbers into the millions. https://en.wikipedia.org/wiki/Roman_numerals#Large_numbers – AmbroseChapel Dec 29 '18 at 02:47
@AmbroseChapel: As I stated, there isn't any (single) standard notation for 5,000, let alone bigger numbers. You have to use one of a number of divergent systems as outlined in the Wikipedia article you link to, and you face problems with the orthography for the system with overbars, underbars, or reversed C etc. And you will have to explain to anyone what system you're using and what it means; people will not, in general, recognize the Roman numerals beyond M. You may choose to think otherwise; that is your prerogative, just as it is my prerogative to stand by my previous comments. – Jonathan Leffler Dec 29 '18 at 04:11

Salvador Dali · Answer 6 · 2014-10-31T07:18:16.220

7

import re
pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'
if re.search(pattern, 'XCCMCI'):
    print 'Valid Roman'
else:
    print 'Not valid Roman'

For people who really want to understand the logic, please take a look at a step by step explanation on 3 pages on diveintopython.

The only difference from original solution (which had M{0,4}) is because I found that 'MMMM' is not a valid Roman numeral (also old Romans most probably have not thought about that huge number and will disagree with me). If you are one of disagreing old Romans, please forgive me and use {0,4} version.

edited Oct 31 '14 at 07:18

answered Oct 31 '14 at 07:11

Salvador Dali

214,103
147
703
753

1

the regex in the answer permits empty numerals. If you don't want it; you could [use a lookahead assertion](http://ideone.com/c9xfNS), to reject empty strings (it also ignores the case of the letters). – jfs May 10 '17 at 20:03

score 4 · Answer 7 · answered Dec 22 '18 at 12:45

4

In my case, I was trying to find and replace all occurences of roman numbers by one word inside the text, so I couldn't use the start and end of lines. So the @paxdiablo solution found many zero-length matches. I ended up with the following expression:

(?=\b[MCDXLVI]{1,6}\b)M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})

My final Python code was like this:

import re
text = "RULES OF LIFE: I. STAY CURIOUS; II. NEVER STOP LEARNING"
text = re.sub(r'(?=\b[MCDXLVI]{1,6}\b)M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})', 'ROMAN', text)
print(text)

Output:

RULES OF LIFE: ROMAN. STAY CURIOUS; ROMAN. NEVER STOP LEARNING

answered Dec 22 '18 at 12:45

user2936263

61
1

Try `text = "I'm RULES OF LIFE: I. STAY CURIOUS; II. NEVER STOP LEARNING" ` with this and it'll output `ROMAN'm RULES OF LIFE: ROMAN. STAY CURIOUS; ROMAN. NEVER STOP LEARNING` – Ste Jul 26 '20 at 20:52
This is what is working for me in javascript as well. – user732456 Apr 14 '21 at 11:13

mekwall · Answer 8 · 2021-07-20T10:21:11.043

Some really amazing answers here but none fit the bill for me since I needed to be able to match only valid Roman numerals within a string without matching empty strings and only match numerals that are on their own (i.e. not within a word).

Let me present you to Reilly's Modern roman numerals strict expression:

^(?=[MDCLXVI])M*(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$

Out of the box it was pretty close to what I needed but it will only match standalone Roman numerals and when changed to match in string it will match empty strings at certain points (where a word begins with an uppercase V, M etc.) and will also give partial matches of invalid Roman numerals such as MMLLVVDD, XXLLVVDD, MMMMDLVX, XVXDLMM and MMMCCMLXXV.

So, after a bit of modification I and ended up with this:

(?<![MDCLXVI])(?=[MDCLXVI])M{0,3}(?:C[MD]|D?C{0,3})(?:X[CL]|L?X{0,3})(?:I[XV]|V?I{0,3})[^ ]\b

The added negative lookbehind will ensure that it doesn't do partial matches of invalid Roman numerals and locking down the first M to 3 since that is the highest it goes in the Roman numeral standard form.

As of right now, this is the only regular expression that passes my extensive test suit of over 4000 tests that includes all possible Roman numerals from 1-3999, Roman numerals within strings and invalid Roman numerals like the ones I mentioned above.

Here's a screenshot of it in action from https://regex101.com/:

I added a word boundary to the very beginning so it doesn't catch words that happen to end with a latin number (if string matching is case insensitive). — dearsina, Dec 16 '21 at 11:11

score 2 · Answer 9 · answered Feb 29 '20 at 21:12

I've seen multiple answers that doesn't cover empty strings or uses lookaheads to solve this. And I want to add a new answer that does cover empty strings and doesn't use lookahead. The regex is the following one:

^(I[VX]|VI{0,3}|I{1,3})|((X[LC]|LX{0,3}|X{1,3})(I[VX]|V?I{0,3}))|((C[DM]|DC{0,3}|C{1,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3}))|(M+(C[DM]|D?C{0,3})(X[LC]|L?X{0,3})(I[VX]|V?I{0,3}))$

I'm allowing for infinite M, with M+ but of course someone could change to M{1,4} to allow only 1 or 4 if desired.

Below is a visualization that helps to understand what it is doing, preceded by two online demos:

Debuggex Demo

Regex 101 Demo

Regular expression visualization

I like this a lot. It's longer, but seems more performant as well. To reduce the complexity of the result, you can use non-capture groups: `(?:I[VX]|VI{0,3}|I{1,3})|(?:(X[LC]|LX{0,3}|X{1,3})(?:I[VX]|V?I{0,3}))|(?:(?:C[DM]|DC{0,3}|C{1,3})(?:X[LC]|L?X{0,3})(?:I[VX]|V?I{0,3}))|(?:M+(?:C[DM]|D?C{0,3})(?:X[LC]|L?X{0,3})(?:I[VX]|V?I{0,3}))` — brandonscript, Apr 01 '21 at 18:26

score 1 · Answer 10 · answered Dec 12 '19 at 18:50

Im answering this question Regular Expression in Python for Roman Numerals here
because it was marked as an exact duplicate of this question.

It might be similar in name, but this is a specific regex question / problem
as can be seen by this answer to that question.

The items being sought can be combined into a single alternation and then
encased inside a capture group that will be put into a list with the findall()
function.
It is done like this :

>>> import re
>>> target = (
... r"this should pass v" + "\n"
... r"this is a test iii" + "\n"
... )
>>>
>>> re.findall( r"(?m)\s(i{1,3}v*|v)$", target )
['v', 'iii']

The regex modifications to factor and capture just the numerals are this :

 (?m)
 \s 
 (                     # (1 start)
      i{1,3} 
      v* 
   |  v
 )                     # (1 end)
 $

score 1 · Answer 11 · answered Sep 07 '20 at 09:28

The following expression worked for me to validate the roman number.

^M{0,4}(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$

Here,

M{0,4} will match thousands
C[MD]|D?C{0,3} will match Hundreds
X[CL]|L?X{0,3} will match Tens
I[XV]|V?I{0,3} will match Units

Below is a visualization that helps to understand what it is doing, preceded by two online demos:

Debuggex Demo

Regex 101 Demo

Python Code:

import re
regex = re.compile("^M{0,4}(C[MD]|D?C{0,3})(X[CL]|L?X{0,3})(I[XV]|V?I{0,3})$")
matchArray = regex.match("MMMCMXCIX")

score 1 · Answer 12 · answered Sep 23 '20 at 19:09

The positive look-behind and look-ahead suggested by @paxdiablo in order to avoid matching empty strings seems not working to me.

I have fixed it by using negative look-ahead instead :

(?!$)M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})

NB: if you append something (eg. "foobar" at the end of the regex, then obviously you'll have to replace (?!$) by (?!f) (where f is the first character of "foobar").

score 0 · Answer 13 · answered Jun 22 '14 at 19:26

0

Steven Levithan uses this regex in his post which validates roman numerals prior to "deromanizing" the value:

/^M*(?:D?C{0,3}|C[MD])(?:L?X{0,3}|X[CL])(?:V?I{0,3}|I[XV])$/

answered Jun 22 '14 at 19:26

Mottie

84,355
30
126
241

ketenks · Answer 14 · 2020-02-29T22:41:03.630

This works in Java and PCRE regex engines and should now work in the latest JavaScript but may not work in all contexts.

(?<![A-Z])(M*(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))(?![A-Z])

The first part is the atrocious negative lookbehind. But, for logical purposes it is the easiest to understand. Basically, the first (?<!) is saying don't match the middle ([MATCH]) if there are letters coming before the middle ([MATCH]) and the last (?!) is saying don't match the middle ([MATCH]) if there are letters coming after it.

The middle ([MATCH]) is just the most commonly used regex for matching the sequence of Roman Numerals. But now, you don't want to match that if there are any letters around it.

See for yourself. https://regexr.com/4vce5

score -1 · Answer 15 · answered Mar 16 '11 at 14:12

-1

The problem of the solution from Jeremy and Pax is, that it does also match "nothing".

The following regex expects at least one roman numeral:

^(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|[IDCXMLV])$

answered Mar 16 '11 at 14:12

Marvin Frommhold

1,040
8
7

6

that one won't work (unless you're using a very weird regex implementation) -- the left part of the `|` can match an empty string and all valid roman numerals, so the right side is completely redundant. and yes, it still matches an empty string. – DirtY iCE Aug 08 '11 at 23:40
"The problem of the solution from Jeremy and Pax is" ... exactly the same as the problem this answer has. If you're going to propose a solution to a supposed problem, you probably should test it. :-) – paxdiablo Jul 12 '15 at 08:37
I got empty string with this – Aminah Nuraini Apr 26 '16 at 10:51

Vince Ypma · Answer 16 · 2015-01-01T12:05:16.253

I would write functions to my work for me. Here are two roman numeral functions in PowerShell.

function ConvertFrom-RomanNumeral
{
  <#
    .SYNOPSIS
        Converts a Roman numeral to a number.
    .DESCRIPTION
        Converts a Roman numeral - in the range of I..MMMCMXCIX - to a number.
    .EXAMPLE
        ConvertFrom-RomanNumeral -Numeral MMXIV
    .EXAMPLE
        "MMXIV" | ConvertFrom-RomanNumeral
  #>
    [CmdletBinding()]
    [OutputType([int])]
    Param
    (
        [Parameter(Mandatory=$true,
                   HelpMessage="Enter a roman numeral in the range I..MMMCMXCIX",
                   ValueFromPipeline=$true,
                   Position=0)]
        [ValidatePattern("^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$")]
        [string]
        $Numeral
    )

    Begin
    {
        $RomanToDecimal = [ordered]@{
            M  = 1000
            CM =  900
            D  =  500
            CD =  400
            C  =  100
            XC =   90
            L  =   50
            X  =   10
            IX =    9
            V  =    5
            IV =    4
            I  =    1
        }
    }
    Process
    {
        $roman = $Numeral + " "
        $value = 0

        do
        {
            foreach ($key in $RomanToDecimal.Keys)
            {
                if ($key.Length -eq 1)
                {
                    if ($key -match $roman.Substring(0,1))
                    {
                        $value += $RomanToDecimal.$key
                        $roman  = $roman.Substring(1)
                        break
                    }
                }
                else
                {
                    if ($key -match $roman.Substring(0,2))
                    {
                        $value += $RomanToDecimal.$key
                        $roman  = $roman.Substring(2)
                        break
                    }
                }
            }
        }
        until ($roman -eq " ")

        $value
    }
    End
    {
    }
}

function ConvertTo-RomanNumeral
{
  <#
    .SYNOPSIS
        Converts a number to a Roman numeral.
    .DESCRIPTION
        Converts a number - in the range of 1 to 3,999 - to a Roman numeral.
    .EXAMPLE
        ConvertTo-RomanNumeral -Number (Get-Date).Year
    .EXAMPLE
        (Get-Date).Year | ConvertTo-RomanNumeral
  #>
    [CmdletBinding()]
    [OutputType([string])]
    Param
    (
        [Parameter(Mandatory=$true,
                   HelpMessage="Enter an integer in the range 1 to 3,999",
                   ValueFromPipeline=$true,
                   Position=0)]
        [ValidateRange(1,3999)]
        [int]
        $Number
    )

    Begin
    {
        $DecimalToRoman = @{
            Ones      = "","I","II","III","IV","V","VI","VII","VIII","IX";
            Tens      = "","X","XX","XXX","XL","L","LX","LXX","LXXX","XC";
            Hundreds  = "","C","CC","CCC","CD","D","DC","DCC","DCCC","CM";
            Thousands = "","M","MM","MMM"
        }

        $column = @{Thousands = 0; Hundreds = 1; Tens = 2; Ones = 3}
    }
    Process
    {
        [int[]]$digits = $Number.ToString().PadLeft(4,"0").ToCharArray() |
                            ForEach-Object { [Char]::GetNumericValue($_) }

        $RomanNumeral  = ""
        $RomanNumeral += $DecimalToRoman.Thousands[$digits[$column.Thousands]]
        $RomanNumeral += $DecimalToRoman.Hundreds[$digits[$column.Hundreds]]
        $RomanNumeral += $DecimalToRoman.Tens[$digits[$column.Tens]]
        $RomanNumeral += $DecimalToRoman.Ones[$digits[$column.Ones]]

        $RomanNumeral
    }
    End
    {
    }
}

How do you match only valid roman numerals with a regular expression?

16 Answers16

Linked

Related