0

For a given set of characters, what is the regex to match all strings that are exactly formed by one or more caracters from the given set ?

Example1: for (a, b, c, d):

  • bdca (match)
  • adb (match)
  • abcg (fail: 'g' not in the set)
  • aab (fail: only one 'a' is in the set)

Example2: for (a, a, c, d):

  • adca (match)
  • aaad (fail: the third a is not in the set)
  • Those shoud work too: a, aa, dc, aac, ada, acd, and daca. But not this: aaca, acada, accd, abcdef

In other terms, each used character will be consumed. So we can use all given characters or only some of them, but without extra characters or duplicate use more than the given number of each character.

I tried several regex but I didn't find any good solution.

Please any help?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
iouhammi
  • 1,108
  • 15
  • 29

2 Answers2

1

Not just regexp work, but I think this will work:

  1. sort the pattern letters. (from your examples: abcd, aacd)
  2. insert regexp codes: aacd -> ^a?a?c?d?$
  3. sort the string
  4. Check the string against the modified pattern.

(If you need to disallow the empty string, that could be an extra check.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Whilst the regex will get super long the more characters you want to include here is an example for just 3 characters:

Short Example 1: for (a, b, c):

  • bca (match)
  • acb (match)
  • abg (fail: 'g' not in the set)
  • aab (fail: only one 'a' is in the set)

^(a(b(c)?)?|a(c(b)?)?|b(a(c)?)?|b(c(a)?)?|c(a(b)?)?|c(b(a)?)?)

Short Example 2: for (a, a, c):

  • aca (match)
  • aaa (fail: the third 'a' is not in the set)

^(a(a(c)?)?|a(c(a)?)?|a(a(c)?)?|a(c(a)?)?|c(a(a)?)?|c(a(a)?)?)

or (optionally) shortened to remove duplicate tests:

^(a(a(c)?)?|a(c(a)?)?|c(a(a)?)?)

How?

Basically this is made up of combinations of the following (a(b(c)?)? where all characters after the first are optional. One these exists for each possible permutation of a, b and c combined together with |.

Community
  • 1
  • 1
Dean Taylor
  • 40,514
  • 3
  • 31
  • 50
  • Thanks for your answer, if I need to generate permutations for regex, so I will do it for strings directly!! I don't think this will work especially for a large set of characters (in my case I need between 7 - 10 characters) – iouhammi Dec 08 '15 at 17:15
  • Assuming no duplicates I guess this wouldn't work well for those lengths as the regular expression would be ~1091 characters in length for 7 character string and ~3422 characters for a 10 character string. – Dean Taylor Dec 08 '15 at 18:04
  • You may find that a simple combination of all the permutations with `|` results in a lower character count. – Dean Taylor Dec 08 '15 at 18:22