Regex for every letter in a string that appears once

Question

I have a regex expression that accepts couples of uppercase characters separated by space:

^([A-Z]{2})([ ][A-Z]{2})*$

I want to make sure that every character appears only once:

for example, good input:

AB CD XY

not a good input:

AB BC

Try `^(?!.*?([A-Z]).*\1)([A-Z]{2})([ ][A-Z]{2})*$`. See live demo here https://regex101.com/r/JxeEsM/1 — revo, Nov 05 '18 at 09:00
No. Some engines support less features than others. It totally depends. — revo, Nov 05 '18 at 09:01
Yeah, it actually works, can you explain the part you've added? — Dor Lugasi-Gal, Nov 05 '18 at 09:03
There is a negative lookahead at beginning that prevents a letter to occur again. This should have a duplicate since it is a common problem but I couldn't find one that directly answers yours. So I'm going to add an answer. Perhaps someone else could find it. — revo, Nov 05 '18 at 09:08
A next to identical question: [Regex to use each letter only once?](https://stackoverflow.com/questions/2631468) — Wiktor Stribiżew, Nov 05 '18 at 10:08
Another idea: [`^(?:([A-Z])(?!.*\1)|[^A-Z])+$`](https://regex101.com/r/WyP5RQ/1/) — bobble bubble, Nov 05 '18 at 12:28
@revo you're right, I've overlooked the exact requirement (: for the `^`... main part after your lookahead I could also think of `(?:[A-Z]{2} ?\b)+$` which doesn't make it faster. — bobble bubble, Nov 05 '18 at 14:25

score 4 · Accepted Answer · answered Nov 05 '18 at 09:16

4

You should prepend below regex to your regular expression:

(?!.*?([A-Z]).*\1)

But it should be just after caret ^. I'm going to break it down:

(?! Start of negative lookahead
- .*? Lazy dot-star to expand matching lazily
- ([A-Z]) Match and capture a capital letter between A and Z
- .* Greedy dot-star to expand matching greedily (it could be lazy)
- \1 Match whatever has been captured in previous capturing group
) End of negative lookahead

and entire regex would be:

^(?!.*?([A-Z]).*\1)([A-Z]{2})([ ][A-Z]{2})*$

See live demo here

But be careful that this changes the order of your capturing groups since it adds one capturing group before all others (so if they were captured in 1 and 2 now they are 2 and 3). If you don't need to return them individually which means you don't need capturing groups then turn them to non-capturing groups:

^(?!.*?([A-Z]).*\1)[A-Z]{2}(?:[ ][A-Z]{2})*$

Because .NET supports infinite lookbehinds then a better approach would be utilizing this feature:

^[A-Z]{2}(?:[ ][A-Z]{2})*$(?<!\1.*([A-Z]).*?)

See live demo here

answered Nov 05 '18 at 09:16

revo

47,783
14
74
117

thank you for the explanation, the way I'm using it is just to split the string afterward with a space separator. so why should I be careful? – Dor Lugasi-Gal Nov 05 '18 at 09:43
i cant understand the differences between the last two examples – Dor Lugasi-Gal Nov 05 '18 at 09:43
I didn't know what are you doing with the regex if it is all about splitting then it wouldn't matter. The last regex uses a negative lookbehind instead of negative lookahead. The lookbehind approach has this benefit that it throws a failure on subject strings that doesn't match expected initial format much earlier than negative lookahead (because with lookahead, engine goes through lookahead then tries to match letters in that format). – revo Nov 05 '18 at 10:18
1

ahh I see, thank you for the information, and I'm sorry if it was a duplicate, I couldn't find it – Dor Lugasi-Gal Nov 05 '18 at 10:33
@WiktorStribiżew I ran the lookahead version two times exactly on the same input string in RegexHero, the second time it was ~%3 faster than its first iteration. So should I care? Also it totally depends on input string. Include this `AB CDXY` in a line and check it against both regular expressions again. – revo Nov 05 '18 at 10:52
@WiktorStribiżew The reason I suggested the lookbehind approach was that it fails faster on wrong input strings. It maintains the original regex behavior on failures at the same steps. But if it is not possible for majority of input strings to have a high failure rate then lookahead version would be fine. It's faster from this standpoint and overall is almost equal. – revo Nov 05 '18 at 11:24
You are right, it will be much quicker with longer failing strings. – Wiktor Stribiżew Nov 05 '18 at 11:28

Regex for every letter in a string that appears once

1 Answers1