1

I am trying to construct a regex to check if a letter occurs in a string, it should be precede only one of 2 other characters. I can do this for one set of chars, but how do I do it for multiple sets in a string?

For example: C can only precede A or B, i.e. if C is found the next char can only be A or B. F can only precede D or E (F cannot precede A or B)

A or B, then C can occur D or E, then F can occur

How do I do this?

The following gives me an error:

String pattern = "([F][D|E])|([A][B|C])";
String test = "FEAC";
System.out.println(test.matches(pattern));
almightyGOSU
  • 3,731
  • 6
  • 31
  • 41
stretchr
  • 615
  • 2
  • 9
  • 24

1 Answers1

2

Assuming the only allowable letters are A to F, you can use this:

^(?:C[AB]|F[DE]|[ABDEG-Z])+$

See the matches in the demo

Explanation

  • The anchors won't be necessary with the matches method, but explaining them for the raw version:
  • The ^ anchor asserts that we are at the beginning of the string
  • The $ anchor asserts that we are at the end of the string
  • Match C[AB] a C then an A or B, OR |
  • F[DE] an F then a D or E, OR |
  • [ABDEG-Z] one of these letters
  • + one or more times

Option: Allowing C and F at the end of the string

If you want to allow C or F at the end of the string, add this: |[CF]$ (one of several ways to do it)

The regex becomes:

^(?:C[AB]|F[DE]|[ABDEG-Z]|[CF]$)+$

In Java:

if (subjectString.matches("(?:C[AB]|F[DE]|[ABDEG-Z])+")) {
    // It matched!
  } 
else {  // nah, it didn't match...  
     } 
zx81
  • 41,100
  • 9
  • 89
  • 105
  • FYI added a second option in case you also want to allow F or C at the end of the string (no following letters) – zx81 Jul 14 '14 at 10:02
  • can we make it generic for all other alphabets as well? – Braj Jul 14 '14 at 10:05
  • yes, this works. following on from Braj, how can I allow other characters to exist as well? For example CBYIU should give true – stretchr Jul 14 '14 at 10:16
  • @Braj Made minor tweak to allow other characters. stretchr, please see revised answer. :) – zx81 Jul 14 '14 at 10:19
  • thanks :) tried this: "(?:I[XV]|X[LC]|[XVLC-Z])+" but it doesn't work with "IVXCIXXLA". It works if I replace A with Z or X or something. How can I make it work for A as well? – stretchr Jul 14 '14 at 10:25
  • `IVXCIXXLA` cannot match with my regex, because C is followed by I. Or is this a new question? If so, and if the first answer worked, please consider accepting and / or upvoting. Thanks! Starting a movie now, will check in later. :) – zx81 Jul 14 '14 at 10:55
  • yup, your answer is good. But I have a follow-up, asked the qn in another thread: https://stackoverflow.com/questions/24736720/regex-precedence-for-multiple-characters – stretchr Jul 14 '14 at 12:39
  • Thanks, glad this one helped. :) – zx81 Jul 14 '14 at 12:55
  • Will have a look at the other one. :) – zx81 Jul 14 '14 at 12:56
  • Thanks, @anubhava! Off to sleep, see you tomorrow. :) – zx81 Jul 14 '14 at 13:44