3

I'm starting with RegEx and I need help, I want to validate that there are two equal characters followed by A, or that they are separated (but remain the same) and in the middle A. I explain with examples:

BBA -> true
ABB -> true
BAB -> true
CCA -> true
ABC -> false
BAC -> false
BBBA -> false (there have to be only two)
ABBB -> false (there have to be only two)

At the moment I have something similar to this, but it does not work correctly:

(([B-Z])\1{2}A) | ([B-Z]{1}A[B-Z]{1}) | (A([B-Z])\1{2})

I know I'm not getting close to the correct answer, what I'm learning. If someone could give me a hand I would appreciate it very much.

Gabriel
  • 31
  • 4
  • In general you probably need to look into backreferences and how to use them with capture groups, so that you can match a character depending on currently captured one: https://stackoverflow.com/questions/21428545/java-regex-how-to-back-reference-capturing-groups-in-a-certain-context-when-the The problem is in your case things like `[A-Z]{2}` will match `AB`, `AZ` and any other 2-symbol regexes, and I don't see a good way to express "same letter from this set repeated" other than like `A{2}A | B{2}A | C{2}A` etc, or using capture groups with backreferences. – Anton Nov 01 '18 at 17:28
  • 1
    But I would avoid solving this kind of problem with regex, it becomes too complex and brittle. Extract 2 first letters in regex, then check if they're the same in java. – Anton Nov 01 '18 at 17:29
  • This answer may help you :- https://stackoverflow.com/a/16717823/7560986 – Dhiral Kaniya Nov 01 '18 at 17:35

3 Answers3

1

Use \b to match only words, and back references for each |.

\b([B-Z])\1A|([B-Z])A\2|A([B-Z])\3\b

Check: https://regexr.com/42bp0

Rocky Li
  • 5,641
  • 2
  • 17
  • 33
  • 1
    Thank you very, very, very much! It works perfectly! I've been thinking about how to solve it for hours. Now I need to understand how it works and keep studying. Again, thank you very much! – Gabriel Nov 01 '18 at 18:08
  • The one mistake i spot in your regex is `([B-Z])\1{2}` actually refer to 3 of the same character, because `([B-Z])` is already referring to one, while `\1{2}` refer to 2 more. – Rocky Li Nov 01 '18 at 18:10
  • This incorrectly matches `BABA` and `BBAA` due to the lack of trailing `\b` in the first two alternatives, and `ABAB` and `AABB` due to lack of leading `\b` on the last two alternatives. The fixed version would be `\b([A-Z])\1A\b|\b([A-Z])A\2\b|\bA([A-Z])\3\b` or by grouping the alternatives, `\b(?:([A-Z])\1A|([A-Z])A\2|A([A-Z])\3)\b`. – Deadcode Dec 21 '19 at 23:01
1

This can be done rather elegantly:

\b(?=[A-Z]{3}\b)A?([B-Z])A?\1A?\b

Demo on regex101

The [A-Z]{3} inside the lookahead asserts that the sequence is exactly 3 letters in length, thanks to having \b on both sides. The A?([B-Z])A?\1A? asserts that there are two identical instances of letter other than A, which can have A interspersed with it at any point, and thanks to also being flanked by \b on both side, asserts that nothing else is mixed in.


This can also be generalized to any number of repeats:

\b(?=[A-Z]{3}\b)(?:\1|(?!\2)([B-Z])()|(?!\3)A())+\b\2

Just change {3} to whatever number of total characters you want (i.e., the number of repeats plus 1 for the "A").

Demo on regex101
Try it online! (Java)

Deadcode
  • 860
  • 9
  • 15
0
Pattern pattern = Pattern.compile("([B-Z])\\1A|A([B-Z])\\2|([B-Z])A\\3")

back references run through the whole pattern and crosses OR operator. In general it is not a good fit for regex solution though as others mentioned.

Note that you have to add ^ and $ as needed to match only this pattern like so:

"^([B-Z])\\1A\$|^A([B-Z])\\2\$|^([B-Z])A\\3\$"
Fakrudeen
  • 5,778
  • 7
  • 44
  • 70