4

I have the following regex:

/([A-Za-z0-9]+)([A-Za-z0-9\-\_]+)([A-Za-z0-9]+)/

It is not working according to my needs, which are:

  • do not allow spaces
  • allow capital English letters
  • allow lowercased English letters
  • allow digits
  • the string may not contain both a hyphen and an underscore
  • hyphen: hyphen cannot be at the beginning or at the end of the string; There can be any amount of hyphens but consecutively there can be only 1 hyphen (a--b is invalid).
  • underscores: underscore cannot be at the beginning or at the end of the string; There can be any amount of underscores but consecutively there can be only 1 underscore (a__b is invalid)
  • the string must contain at least 1 character (letter)

Valid examples:

  • a1_b_2_hello
  • 2b-ffg-er2
  • abs
  • 123a

Invalid examples:

  • _a1_b_2_hello
  • 2b-ffg_er2-
  • __
  • --
  • a__
  • b--2
Andrey Deineko
  • 51,333
  • 10
  • 112
  • 145
  • you say `the string must contain at least 1 character`, so how is `123` valid? is `a` or `1` supposed to be valid? – depperm Aug 09 '19 at 15:56
  • @depperm it's a mistake in description, good catch, `123` is invalid – Andrey Deineko Aug 09 '19 at 15:57
  • @depperm I post my commend in wrong post, I want ask Andrey Deineko what he try to match, Andrey Deineko give an example what you try match – noname Aug 09 '19 at 16:00
  • 4
    You have a greatly formalized algorithm and are trying to solve it with a regular expression. May I ask why? What would be wrong with checking all conditions step be step? – Aleksei Matiushkin Aug 09 '19 at 16:04
  • @CarySwoveland I edited the question as you suggested, thanks – Andrey Deineko Aug 10 '19 at 22:58
  • @AlekseiMatiushkin I thought of implementing a step by step check using Ruby (Rails) and the method occurred cumbersome and I thought that probably string validation is better with regexp, because is regexp not what it is there for? – Andrey Deineko Aug 10 '19 at 23:02

4 Answers4

4

I find it convenient to put all the special conditions at the beginning in positive and negative lookaheads and follow these (which consume no characters) with the general requirement, here [a-z\d_-]+\z.

r = /
    \A           # match start of string  
    (?!.*        # begin negative lookahead and match >= 0 characters
      (?:--|__)  # match -- or __ in a non-capture group
    )            # end negative lookahead
    (?![-_])     # do not match - or _ at the beginning of the string
    (?!.*[-_]\z) # do not match - or _ at the end of the string
    (?!          # begin negative lookahead
      .*-.*_     # match - followed by _ 
      |          # or
      .*_.*-     # match _ followed by - 
    )            # end negative lookahead
    (?=.*[a-z])  # match at least one letter 
    [a-z\d_-]+   # match one or more English letters, digits, _ or -
    \z           # match end of string
    /ix          # case indifference and free-spacing modes

 "a".match? r          #=> true   
 "aB32-41".match? r    #=> true
 "".match? r           #=> false (must match a letter)
 "123-4_5".match? r    #=> false (must match a letter)
 "-aB32-4_1".match? r  #=> false (cannot begin with -)
 "aB32-4_1-".match? r  #=> false (cannot end with -)
 "_aB32-4_1".match? r  #=> false (cannot begin with _)
 "aB32-4_1_".match? r  #=> false (cannot end with _)
 "aB32--4_1".match? r  #=> false (cannot contain --)
 "aB32-4__1".match? r  #=> false (cannot contain __)
 "aB32-4_1".match? r   #=> false (cannot contain both - and _)
 "123-4_5$".match?  r  #=> false ($ is not a permitted character)

This regular expression is conventionally written:

/\A(?!.*(?:--|__))(?![-_])(?!.*[-_]\z)(?!.*-.*_|.*_.*-)(?=.*[a-z])[a-z\d_-]+\z/i
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
3

You could add the a-zA-Z in a character class, and in the repetition of 0+ times match either a hyphen or an underscore [-_] followed by 1+ times what is listed in the character class [A-Za-z0-9]+.

Use a capturing group with a backreference to get a consistent using of - or _

\A[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*(?:([-_])[A-Za-z0-9]+(?:\1[A-Za-z0-9]+)*)?\z

About the pattern

  • \A Start of string
  • [A-Za-z0-9]*[A-Za-z][A-Za-z0-9]* Match at least 1 a-zA-Z
  • (?: Non capturing group
    • ([-_]) Capturing group 1, match either - or _
    • [A-Za-z0-9]+ Match 1+ times what is listed
    • (?:
      • \1[A-Za-z0-9]+ Backreference \1 to what is captured in group 1 to get consistent delimiters (to prevent matching a-b_c) and match 1+ times what is listed
    • )*Close non capturing group and make it optional
  • )? Close non capturing group and make it optional
  • \z End of string

Regex demo

See this page for a detailed explanation about the anchors.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Thanks for the answer! this one matches `a-b_c` https://regex101.com/r/aY5Doc/2/ which shouldn't – Andrey Deineko Aug 09 '19 at 15:59
  • Editing the description, sorry – Andrey Deineko Aug 09 '19 at 16:00
  • It works, thank you very much. I have 0 understanding and there's 0 sense to even try... – Andrey Deineko Aug 09 '19 at 16:04
  • I will add an explanation. – The fourth bird Aug 09 '19 at 16:04
  • 1
    @AndreyDeineko I have added an explanation about the pattern. – The fourth bird Aug 09 '19 at 16:20
  • 1
    This regex passes all the tests in my answer. – Cary Swoveland Aug 09 '19 at 19:11
  • @Thefourthbird do you by chance know how to now convert this regexp into one supported by Postgres? I'd like to add it as a check constraint – Andrey Deineko Aug 11 '19 at 13:20
  • 1
    @AndreyDeineko You could use [regexp_matches](https://www.postgresql.org/docs/9.3/functions-matching.html). It will return the value of the capturing group when it is present. In this case to keep the backreference you could wrap the whole pattern in another group, which will then be group 1 and use a backreference to the inner group 2. `^([A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*(?:([-_])[A-Za-z0-9]+(?:\2[A-Za-z0-9]+)*)?$)')` [Regex demo](https://regex101.com/r/J3BXWe/1) and a [postgress demo](https://rextester.com/DPYHP26741). – The fourth bird Aug 11 '19 at 14:00
  • @AndreyDeineko In the test it appears to be valid right? https://rextester.com/EBZY48543 Which version of postgresql do you use? – The fourth bird Aug 12 '19 at 08:55
  • @Thefourthbird `PostgreSQL 10.6` I don't understand it, in the psql console the string is valid (matching returns true), but when added as check constraint this one becomes invalid... – Andrey Deineko Aug 12 '19 at 09:00
  • @Thefourthbird please nevermind. I don't know why but after resetting the db the check works properly - I needed to escape the back slash: `CHECK (name ~ '^([A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*(?:([-_])[A-Za-z0-9]+(?:\\2[A-Za-z0-9]+)*)?$)');` – Andrey Deineko Aug 12 '19 at 09:11
  • Does it work when you write out both alternatives with `-` or `_`. [Regex demo](https://regex101.com/r/MZjSkQ/1) and [Postgresql demo](https://rextester.com/KNLZL11457) – The fourth bird Aug 12 '19 at 09:16
  • @AndreyDeineko No problem at all, glad it works. Good luck! – The fourth bird Aug 12 '19 at 09:17
1

You could add (?!.*(\-\-|__|_.*\-|\-.*_).*), to check for consecutive dash or underscores and only one type before the middle capture group, and (?=.*[a-z].*) before everything to check for at least one character. So the whole thing would look like:

(?=.*[a-zA-Z].*)([A-Za-z0-9]+)(?!.*(\-\-|__|_.*\-|\-.*_).*)([A-Za-z0-9\-\_]+)([A-Za-z0-9]+)
depperm
  • 10,606
  • 4
  • 43
  • 67
0

Solution using lookahead/lookbehind assertions

^[a-z\d](?!.*--.*)(?!.*__.*)(?!.*-.*_)(?!.*_.*-)[\w-]*(?<=[^_-])$

Click here to see demo

  • ^[a-z\d] - Start with any letter or digit
  • (?!.*--.*)(?!.*__.*) - Lookahead to make sure there's no __ or --
    • (?!.*-.*_)(?!.*_.*-) - Lookahead to make sure there's no _ followed by - or vice-versa
  • [\w-]* - optionally match any letter, digit, _ or a -
  • (?<=[^_-]) - Lookback to make sure it doesn't end with a - or _

Sample data:

# Start with a letter
a
ab
abc
abc123
abc123abc

# Has underscores
a_b
a_b_c

# Has dashes
a-b
a-b-c

# Can't start or end with an underscore or dash
_abc
abc_
abc-
-abc

# Can't contain -- or __
a__
a__b
a__b__c
d--
d--e
a--b--c

# Can only use _ or - but not both

a-_b
a-_b-_c
a-_b_-d
a-_b_____f--_-_--__
Patrick C
  • 186
  • 9