1

I need to validate string of mixed upper and lower case characters [a-zA-Z] doesn't contain any lower case sub-strings of length 3 where every character is out of allowed set of characters.

For example, lets say that allowed set of characters that may repeat is [abc]. The list of allowed strings will looks like

abc
abcbcbcb
aaaaaaaa
AabcbaDEF
ABCDEFG
abcDEfgH
AbcDefG

and list of invalid strings

abcd
AbcdE
AabcbcdefgK

Is it possible to do it with regular expressions?

Mr. Pumpkin
  • 6,212
  • 6
  • 44
  • 60
  • 1
    I'm not sure I understand your question. How are the strings you presented considered valid/invalid? – ctwheels Jan 19 '18 at 19:59
  • the string is valid is any 3 lowercase characters length sub-string contains only characters from [abc]. Otherwise it's invalid. – Mr. Pumpkin Jan 19 '18 at 20:08
  • 1
    I think I understand, why is `AbcDefG` valid then? – ctwheels Jan 19 '18 at 20:10
  • @ctwheels - it's valid cause it doesn't contain any 3 lowercase characters substrings at all – Mr. Pumpkin Jan 19 '18 at 20:26
  • Now it makes sense thanks for the clarification! You should add the information from the comments above into your question to make it more clear and concise – ctwheels Jan 19 '18 at 20:52

2 Answers2

2

Based on all the information in your question and comments, you're looking for strings that match the following requirements:

  • Any number of alpha characters so long as the following doesn't exist:
    • A substring of 3 lowercase letters containing a lowercase letter that is not in the set [abc]

Given those requirements (you definitely had me scratching my head for a bit there), the following regular expressions should work:

See regex in use here

^(?:(?!(?=[a-z]{3})[abc]{0,2}[^abc])[a-zA-Z])+$
^(?:(?!(?=[a-z]{3})[abc]{0,2}[d-z])[a-zA-Z])+$

Thanks to @anubhava in the comments below this answer for providing a faster alternative (doesn't use tempered greedy token):

See regex in use here

^(?!.*(?=[a-z]{3})[abc]{0,2}[d-z])[a-zA-Z]+$
  • ^ Assert position at the start of the line
  • (?:(?!(?=[a-z]{3})[abc]{0,2}[^abc])[a-zA-Z])+ Tempered greedy token matching any ASCII alpha character ensuring the following doesn't exist:
    • (?=[a-z]{3}) Positive lookahead ensuring the following 3 characters are lowercase ASCII alpha characters
    • [abc]{0,2}[^abc] Matches between 0 and 2 characters from the set abc, followed by at least one character not in the set abc
      • The above basically ensures at least one lowercase letter not in our set abc exists within the lowercase substring of length 3
  • $ Assert position at the end of the line
ctwheels
  • 21,901
  • 9
  • 42
  • 77
2

You may use this regex:

^(?!.*(?:[d-z][a-z]{2}|[a-z]{2}[d-z]|[a-z][d-z][a-z]))[A-Za-z]+$

RegEx Demo

RegEx Explanation:

  • ^: Start
  • (?!.*: Start negative lookhead
    • (?:: Start non-capture group
      • [d-z][a-z]{2}: Match d-z letter followed by any 2 non-abc letters
      • |: OR
      • [a-z]{2}[d-z]: Match 2 lowercase letters followed by a d-z letter
      • |: OR
      • [a-z][d-z][a-z]: Match a lowercase letter followed by d-z followed by lower letter
    • ): End non-capture group
  • ): End negative lookahead
  • [A-Za-z]+: Match 1+ lower/upper case characters
  • $: End
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    This accepts `AbdcDefG` and `AdbcDefG`, you need to change `[a-z][d-z]{2}` to `[a-z][d-z][a-z]` and also change `[d-z]{3}` tp `[d-z][a-z]{2}` – ctwheels Jan 19 '18 at 21:13