13

How can I match letters a,b,c once in any combination and varying length like this:

The expression should match these cases:

abc
bc
a
b
bca

but should not match these ones:

abz
aab
cc
x
Ωmega
  • 42,614
  • 34
  • 134
  • 203
John_Sheares
  • 1,404
  • 2
  • 21
  • 34

6 Answers6

19

Use regex pattern

\b(?!\w*(\w)\w*\1)[abc]+\b

You can use this pattern with any set and size, just replace [abc] with desired set...


Example:

enter image description here

(above output is from myregextester)

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • For continuous string "abcbac", this method could not match both "abc" and "bac". Anyway, this is still very cool. The first part (?!(?:.\B)*(.)(?:\B.)*\1) tries to find a starting position such that from the starting position to \newline, there is no duplication. Then from the starting position, it tries to match [abc]+ – William Mar 15 '16 at 21:23
6
^(?=([^a]*a?[^a]*)$)(?=([^b]*b?[^b]*)$)(?=([^c]*c?[^c]*)$)[abc]{1,3}$

This works with lookaheads.

It includes this pattern in three variations: (?=([^a]*a?[^a]*)$)

It says: There needs to be at most one a from here (the beginning) until the end.

Combining lookaheads and backreferences:

^([abc])((?!\1)([abc])((?!\1)(?!\3)[abc])?)?$
phant0m
  • 16,595
  • 5
  • 50
  • 82
3

Just to round out the collection:

^(?:([abc])(?!.*\1))+$

Want to handle a larger set of characters? No problem:

^(?:([abcdefgh])(?!.*\1))+$

EDIT: Apparently I misread the question; you're not validating individual strings like "abc" and "ba", you're trying to find whole-word matches in a larger string. Here's how I would do that:

\b(?:([abc])(?![abc]*\1))+\b

The tricky part is making sure the lookahead doesn't look beyond the end of the word that's currently being matched. For example, if I had left the lookahead as (?!.*\1), it would fail to match the abc in abc za because the lookahead would incorrectly flag the a in za as a duplicate of the a in abc. Allowing the lookahead to look only at valid characters ([abc]*) keeps it on a sufficiently short leash. And if there are invalid characters in the current word, it's not the lookahead's job to spot them anyway.

(Thanks to Honest Abe for bringing this back to my attention.)

Community
  • 1
  • 1
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • This would be my choice to upvote, but the example doesn't seem to work as is. Does the example work with a certain language? I got it to work by changing it to `\b(?:([abc])(?!\1))+\b` – Honest Abe Feb 08 '13 at 07:22
  • I was assuming the regex would be applied to each string in isolation ("abc", "cb", "abz", etc.), but it looks like the OP wants to pluck whole-word matches from a larger string. So you're right, I should have used `\b` instead of the anchors, but you can't just remove the `.*` from the lookahead. That correctly filters out `aab` but not `aba`. See my edit for the corrected regex. – Alan Moore Feb 08 '13 at 21:13
1
^(?=(.*a.*)?$)(?=(.*b.*)?$)(?=(.*c.*)?$)[abc]{,3}$

The anchored look-aheads limit the number of occurrences of each letter to one.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This also matches the "ab" in "abz", and "aab" and "cc". So this is not correct. (I just tested in the Patterns regex app for Mac) – Joe Dyndale Nov 24 '12 at 22:27
  • In the Patterns app this only works when I select both the single-line AND multi-line options - which I'm not sure makes much sense and may be a bug in Patterns. – Joe Dyndale Nov 24 '12 at 22:54
  • In English, this RegEx says: Be empty OR end in `a`, `b` and `c` at the same time, which is of course impossible. – phant0m Nov 24 '12 at 23:00
  • @phant0m Actually it doesn't say that at all. The look aheads each say "there must be either 1 or 0 "a" somewhere in the whole input. – Bohemian Nov 24 '12 at 23:54
  • No, it did not and it doesn't say that after your edit either. Now, it says: The string is either empty OR it contains at least an `a`, a `b` and a `c`. (Neglecting that `{,3}` is invalid) – phant0m Nov 25 '12 at 00:01
  • Maybe it's better if I said *why* I came to the above conclusion: `(?=(.*a.*)?$)` This lookahead starts *right after* the beginning of the string. Since the lookahead has a dollar sign at the end, `(.*a.*)?` must match everything from the beginning of the line until the end. Since it's got a question mark the line *might* be empty, OR there has to be at least one `a`. There is nothing that prevents it from having multiple `a`s. Because the other patterns work in similar, the string must contain an `a`, a `b` and a `c` if it's not empty. I hope that clears things up, cheers. – phant0m Nov 25 '12 at 11:01
1

Try this regex:

^([abc])((?!\1)([abc]))?((?!(\1|\2))([abc]))?$

Check in regexpal

Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
0

I linked it in comment (this is sort of a dupe of How can I find repeated characters with a regex in Java?).. but to be more specific.. the regex:

(\w)\1+

Will match any two or more of the same character. Negate that and you have your regex.

Community
  • 1
  • 1
Aimon Bustardo
  • 158
  • 1
  • 7
  • He only wants a, b, and c - so replacing \w with [abc] seems like the obvious choice - but the negation of that would still match "abz" and "x" - so it's not a final solution. – Joe Dyndale Nov 24 '12 at 22:41