1

building on from this question, I have a problem extending my regex, which now looks like this:

String pattern= "(?:I[XV]|X[LC]|C[DM]|[XVLCDM])+";
        String test="XM";
        System.out.println(test.matches(pattern));

The allowable characters in my string are IVXLCDMAQT

XM currently returns true. But it should not because X can only precede L or C. How can I modify my current regex to prevent XM from returning true and also have the allowable characters in my string?

Update based on request: The precedence:

I can be followed only by X or V, X can only be followed by L or C, C can only be followed by D or M. The rest of the letters don't matter.

Thus XM should return false. However, currently it doesn't

Community
  • 1
  • 1
stretchr
  • 615
  • 2
  • 9
  • 24
  • 1
    If your aim is to match only valid roman numerals, then your approach is flawed. You need to start with the high numbers and work your way down, not the other way around: http://stackoverflow.com/a/267405/20670 – Tim Pietzcker Jul 14 '14 at 12:54
  • Why you added `[XVLCDM]` in your pattern? – Avinash Raj Jul 14 '14 at 13:02
  • because those are the allowable characters in my string, I should include one more actually. but to be more general, can I devise the regex to include all other characters? – stretchr Jul 14 '14 at 13:07

1 Answers1

2

If I understand your requirements, we need to remove X and C from the final alternation, and to add some lookaheads:

^(?:I(?=[XV])|X(?=[LC])|C(?=[DM])|[VLDM])+$

Option: Allowing I, X and Cat the end of the string

If you want to allow I, X or C at the end of the string, add this: |[IXC]$ (one of several ways to do it)

The regex becomes:

(?:I(?=[XV])|X(?=[LC])|C(?=[DM])|[VLDM]|[IXC]$)+
zx81
  • 41,100
  • 9
  • 89
  • 105
  • My first attempt a minute ago had a bug. Fixed, please have a look: ) – zx81 Jul 14 '14 at 13:06
  • yes, it's working. can you explain why I needed to remove X and C please? meanwhile, I'm testing it – stretchr Jul 14 '14 at 13:08
  • also what's the difference when you put I(?=[XV]) as opposed to your earlier version of I[XV]? – stretchr Jul 14 '14 at 13:10
  • `(?=[XV])` is a lookahead. It means match an I, as long as it is followed by a X or a V. We had to remove the X and C because the character class `[XVLCDM]` matches them without putting any condition on them. `[XVLCDM]` means match one character from this class. Very late here, but hope it helps and let me know if you have questions. :) – zx81 Jul 14 '14 at 13:19
  • counter cases: XLIXI this causes a false to be returned. Same with XLIXX. But [IXC] should allow for the string to end with I,X,C right? – stretchr Jul 14 '14 at 13:20
  • `XLIXI` doesn't match because I cannot follow X, right? `XLIXX` doesn't match because X cannot follow X, right? (only L or C). But if you want to allow X to follow X, use `^(?:I(?=[XV])|X(?=[LCX])|C(?=[DM])|[VLDM]|[IXC]$)+$` – zx81 Jul 14 '14 at 13:24
  • got it! I understand it's late for you, thanks LOTS! – stretchr Jul 14 '14 at 13:28
  • Hi zx81, got a follow-up question, if you can have a look at it, it's here: http://stackoverflow.com/questions/24747645/regex-multiple-repeating-characters-with-one-interjection-in-java – stretchr Jul 14 '14 at 23:31
  • Hey stretchr, let me have a look. :) – zx81 Jul 15 '14 at 00:31