2

I am looking for a regex that matches strings (i.e., passwords) that have at least 8 characters, at least one upper case character, at least one lower case character, and at least one number.

A regex that works (with the help of here) would be:

(^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,}$)

This regex uses positive lookahead (?=). This is an expensive operation. Is it possible to make this regex without using positive lookahead?

Community
  • 1
  • 1
user2609980
  • 10,264
  • 15
  • 74
  • 143
  • 3
    **This is an expensive operation** What makes you say so? You can always accept `^[0-9a-zA-Z]{8,}$` from users and do the validation using code. – anubhava Sep 19 '14 at 14:59
  • it's expansive anyway to use regex. i don't know who started this. this work is not suitable for regex. it's much faster and more accurate to use code to test it instead of relying on the regex engine and hoping it does the best. – Jason Hu Sep 19 '14 at 15:03
  • @anubhava The fact that everything is kept in buffer. Someone explained this via the [Mastering Regular Expression Third Edition](http://dl.e-book-free.com/2013/07/mastering_regular_expressions_third_edition.pdf) book. And of course this is faster in code, but that is not the question. Is it possible to do this with a regex without lookahead? – user2609980 Sep 19 '14 at 15:09
  • We shouldn't try to solve a problem on grounds like that. Why s creating some tiny buffer causing any problem to you? Are you running out of memory? Even using regular expressions itself will incur some small cost so why even use regex. – anubhava Sep 19 '14 at 15:13
  • It is kind of like a puzzle @anubhava. But I just heard that this is actually *not* possible, you have to keep some state, either with lookahead or in code. – user2609980 Sep 19 '14 at 15:41
  • it's indeed impossible given you want to count the number of characters. i don't think it's a regular language. actually if you have to use more than one lookaheads, that means you need more than one regex's to completely describe it. the password matching your requirements should not be regular. – Jason Hu Sep 19 '14 at 16:19
  • @HuStmpHrrr There's one thing about "perl-like" regex... It may not be regular either :-) – Mariano Oct 21 '15 at 21:50
  • 1
    @Mariano indeed it's not regular. extensive use of pcre like this is almost cheating since it relies on something beyond the capability of commonly recognised understanding of regular expression. but anyway, nice hack. – Jason Hu Oct 22 '15 at 00:16

1 Answers1

2

at least 8 characters, at least one upper case character, at least one lower case character, and at least one number

It is kind of like a puzzle

Ok, I'm going to take this as a puzzle, provided it's understood that:

  • Coding without regex would be more efficient.
  • A lookahead is NOT sinificantly expensive compared to the cost of using regex on its own.

And that this solution may be even more expensive than using a lookahead.

Description

  1. We can use a subpattern like

    \A(?:[A-Z]|[a-z]|[0-9]|.){8,}
    

    to check there are at least 8 characters in the subject, while providing the 4 options (uppercase, lowercase, digit, or some other character).

  2. Then, we'll create a backreference for the first 3 required options:

    \A(?:(?<upper>[A-Z])|(?<lower>[a-z])|(?<digit>[0-9])|.){8,}
    
  3. And finally, we'll use an IF clause to check that each group was captured:

    (?(upper)  (?#check 2nd condition)  |  (?#make it fail)  )
    

    using (?!) to make it fail if any of the conditions isn't met:

    (?(upper)(?(lower)(?(digit)|(?!))|(?!))|(?!))
    

Regex

\A                        #beggining of string
(?>                       #MAIN iteration (atomic only for efficiency)
    (?<upper>[A-Z])       #  an uppercase letter
  |                       # or
    (?<lower>[a-z])       #  a lowercase letter
  |                       # or
    (?<digit>[0-9])       #  a digit
  |                       # or
    .                     #  anything else
){8,}?                    #REPEATED 8+ times
                          #
                          #CONDITIONS:
(?(upper)                 # 1. There must be at least 1 uppercase
    (?(lower)             #    2. If (1), there must be 1 lowercase
        (?(digit)         #       3. If (2), there must be 1 digit
          | (?!)          #          Else fail
        )                 #
      | (?!)              #       Else fail
    )                     #
  | (?!)                  #    Else fail
)                         #

One-liner:

\A(?>(?<upper>[A-Z])|(?<lower>[a-z])|(?<digit>[0-9])|.){8,}?(?(upper)(?(lower)(?(digit)|(?!))|(?!))|(?!))

regex101 demo

Mariano
  • 6,423
  • 4
  • 31
  • 47