1

So far I have this regex ^(?!.*?(a|c|e|g|i).*?\1)[acegi]+$ which match any word as combination of the characters "acegi", and these characters can occur only once.

Now I'm trying to match any word which will consist of given characters and these characters can repeat as many times as given.

Example for set of given characters "acegii"

Valid matches: "acegii" "ace" "a" "i" "ai" "gii" "ici" "iic" "aicige" etc.

Invalid matches: "acegiii" "iacegii" "iii" "aa" "cc" etc.

Thanks for any help!

Note: the characters set in the regex should be easily replaceable if possible.

Prefered regexs: posix, ruby

Nikos
  • 515
  • 1
  • 4
  • 16
  • I have doubts that this can be solved with regular expressions, if I think about the automata-theoric background. Are there any other tools allowed? – elias Mar 29 '14 at 13:56
  • Well I solved this in Ruby by a simple loop and .count method, but it wasn't fast as expected, so I'm trying to filter it on the database level (PostgreSQL). Thanks for interest, now I will try the suggestions below and will see. – Nikos Mar 29 '14 at 15:01

2 Answers2

3

You can use something similar to what you have, but with a second negative lookahead for the i:

^(?!.*?([aceg]).*?\1)(?!.*?i.*?i.*?i)[acegi]+$

Basically, one negative lookahead for each number of 'most' appearances.

rubular demo

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Thanks it works, but it seems there is no systematic solution for this via regex. What if the "i" character could appear 5 times? How would it look like? – Nikos Mar 29 '14 at 15:17
  • 1
    It'll be like `(?!.*?i(?:.*i){5})`. Basically, I'd say 1 negative lookahead for each group of limit. E.g. 1 negative lookahead for everything that must appear at most 1 time. – Jerry Mar 29 '14 at 15:34
  • Thank you. I will just ask for one more hint:. How can I add more characters to `(?!.*?i(?:.*i){5})`? Eg: "i" and "c" can both appear max 5 times, so "iiiiiccccc" should be valid. – Nikos Mar 30 '14 at 18:30
  • @Nikos You're welcome. You can just alter the format to be similar to the first negative lookahead: `(?!.*?([ci])(?:.*\1){5})` When you have only nore than one character in a 'group', that format is the way to go. – Jerry Mar 30 '14 at 18:33
  • Hmm, I have something wrong here http://rubular.com/r/6MHAdXEQQr, "accegiici" should be invalid, but it pass (I modified it for 2 times). – Nikos Mar 30 '14 at 18:44
  • 1
    @Nikos Oh. I forgot to mention something. Since you already have a first negative lookbehind, you already have a first capture group there. Use `\2` instead ([link](http://rubular.com/r/u5wAXremWI)). Sorry for the inconvenience. Also, you don't need to repeat characters in a character class. `[accegii]` is equivalent to `[acegi]`. – Jerry Mar 30 '14 at 18:48
  • Thanks again! I got error when tried in PostgreSQL so I've raised [new question](http://stackoverflow.com/questions/22751376/postgresql-invalid-regular-expression-invalid-backreference-number) if you knew. – Nikos Mar 30 '14 at 23:51
0

Quantify your lookahead:

/^(?!.*?([acegi])(?:.*?\1){N})[acegi]+$/

Replace that N with the number of appearances that are allowed - for instance, {1} will allow a single one of each character. {2} will allow one or two occurrences. {3} allows up to three, and so on.

Keep in mind, though, that you are dangerously close to the path of catastrophic backtracking, which could well crash your script.

You may want to use string operations instead. In summary:

  • Match string against /^[acegi]+$/
  • Count number of occurrences of each character (ie. iterate through the string)
  • Get the maximum number of occurrences (could be a simple max() call if done right)
  • If that max is higher than your allowed limit, trigger failure.
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Thanks for suggestion, I see it's not much systematic solution with regex. I did that over string operations in ruby, but for large amount of given words from database it wasn't so fast. – Nikos Mar 29 '14 at 15:10