3

I have a regex expression that accepts couples of uppercase characters separated by space:

^([A-Z]{2})([ ][A-Z]{2})*$

I want to make sure that every character appears only once:

for example, good input:

AB CD XY

not a good input:

AB BC

revo
  • 47,783
  • 14
  • 74
  • 117
Dor Lugasi-Gal
  • 1,430
  • 13
  • 35

1 Answers1

4

You should prepend below regex to your regular expression:

(?!.*?([A-Z]).*\1)

But it should be just after caret ^. I'm going to break it down:

  • (?! Start of negative lookahead
    • .*? Lazy dot-star to expand matching lazily
    • ([A-Z]) Match and capture a capital letter between A and Z
    • .* Greedy dot-star to expand matching greedily (it could be lazy)
    • \1 Match whatever has been captured in previous capturing group
  • ) End of negative lookahead

and entire regex would be:

^(?!.*?([A-Z]).*\1)([A-Z]{2})([ ][A-Z]{2})*$

See live demo here

But be careful that this changes the order of your capturing groups since it adds one capturing group before all others (so if they were captured in 1 and 2 now they are 2 and 3). If you don't need to return them individually which means you don't need capturing groups then turn them to non-capturing groups:

^(?!.*?([A-Z]).*\1)[A-Z]{2}(?:[ ][A-Z]{2})*$

Because .NET supports infinite lookbehinds then a better approach would be utilizing this feature:

^[A-Z]{2}(?:[ ][A-Z]{2})*$(?<!\1.*([A-Z]).*?)

See live demo here

revo
  • 47,783
  • 14
  • 74
  • 117
  • thank you for the explanation, the way I'm using it is just to split the string afterward with a space separator. so why should I be careful? – Dor Lugasi-Gal Nov 05 '18 at 09:43
  • i cant understand the differences between the last two examples – Dor Lugasi-Gal Nov 05 '18 at 09:43
  • I didn't know what are you doing with the regex if it is all about splitting then it wouldn't matter. The last regex uses a negative lookbehind instead of negative lookahead. The lookbehind approach has this benefit that it throws a failure on subject strings that doesn't match expected initial format much earlier than negative lookahead (because with lookahead, engine goes through lookahead then tries to match letters in that format). – revo Nov 05 '18 at 10:18
  • 1
    ahh I see, thank you for the information, and I'm sorry if it was a duplicate, I couldn't find it – Dor Lugasi-Gal Nov 05 '18 at 10:33
  • @WiktorStribiżew I ran the lookahead version two times exactly on the same input string in RegexHero, the second time it was ~%3 faster than its first iteration. So should I care? Also it totally depends on input string. Include this `AB CDXY` in a line and check it against both regular expressions again. – revo Nov 05 '18 at 10:52
  • @WiktorStribiżew The reason I suggested the lookbehind approach was that it fails faster on wrong input strings. It maintains the original regex behavior on failures at the same steps. But if it is not possible for majority of input strings to have a high failure rate then lookahead version would be fine. It's faster from this standpoint and overall is almost equal. – revo Nov 05 '18 at 11:24
  • You are right, it will be much quicker with longer failing strings. – Wiktor Stribiżew Nov 05 '18 at 11:28