17

I've seen the following regular expression around the web.

(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$

It validates only if the string:

   * contain at least (1) upper case letter
   * contain at least (1) lower case letter
   * contain at least (1) number or special character
   * contain at least (8) characters in length

I'd like to know how to convert this regular expression so that it checks the string to

* contain at least (2) upper case letter
* contain at least (2) lower case letter
* contain at least (2) digits
* contain at least (2) special character
* contain at least (8) characters in length

Well, if it contains at least 2 upper,lower,digits and special characters then I wouldn't need the 8 characters length.

Special characters include:

`~!@#$%^&*()_-+=[]\|{};:'".,/<>?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jason
  • 195
  • 1
  • 1
  • 7
  • 2
    Please define *special character*. Do you mean only: `!@#$%^&*()-_=+[{]};:'",<.>/?` or maybe something more? – Crozin Apr 14 '10 at 13:55
  • updated the question with the charset of special characters – Jason Apr 14 '10 at 14:11
  • 4
    Don't you hate those web sites that restrict password complexity "for security reasons"? You want to set ".evmhcfcyK" (easy to remember because it comes from a sentence) and you end up with "abcd1234" written in a post-it note. – Álvaro González Apr 14 '10 at 14:18

3 Answers3

66

I have to agree with Alan. If the existing regex is so complicated, why try and do it in just one regex?

Just break it down into approachable simple steps. You have already done that.

Now write 4 regex to validate your parts, add basic logic to the 4 regex and measure the length of the string. Done.

Which would you rather debug, this:

(?=^(?:[^A-Z]*[A-Z]){2})(?=^(?:[^a-z]*[a-z]){2})(?=^(?:\D*\d){2})(?=^(?:\w*\W){2})^[A-Za-z\d\W]{8,}$ (which does not work btw...)

or this:

function valid_pass($candidate) {
   $r1='/[A-Z]/';  //Uppercase
   $r2='/[a-z]/';  //lowercase
   $r3='/[!@#$%^&*()\-_=+{};:,<.>]/';  // whatever you mean by 'special char'
   $r4='/[0-9]/';  //numbers

   if(preg_match_all($r1,$candidate, $o)<2) return FALSE;

   if(preg_match_all($r2,$candidate, $o)<2) return FALSE;

   if(preg_match_all($r3,$candidate, $o)<2) return FALSE;

   if(preg_match_all($r4,$candidate, $o)<2) return FALSE;

   if(strlen($candidate)<8) return FALSE;

   return TRUE;
}

Why folks feel they have to write a regex that no one can understand just so they can do it in one go is beyond me...


Ok ok -- if you really want a single regex, learn about lookaheads to validate your rules.

This monster does what you asked in one go:

^                                        # start of line
(?=(?:.*[A-Z]){2,})                      # 2 upper case letters
(?=(?:.*[a-z]){2,})                      # 2 lower case letters
(?=(?:.*\d){2,})                         # 2 digits
(?=(?:.*[!@#$%^&*()\-_=+{};:,<.>]){2,})  # 2 special characters
(.{8,})                                  # length 8 or more
$                                        # EOL 

Demo

dawg
  • 98,345
  • 23
  • 131
  • 206
  • I like this solution; nice and clear what is going on while being reasonably concise. +1 – andrhamm Oct 17 '11 at 03:33
  • @andrhamm Is there any way how to connect the first two lines? Upper OR lower characters not upper AND lower? – Byakugan May 25 '12 at 13:11
  • @Byakugan try $r1='/[A-Za-z]/'; //Uppercase or lowercase – andrhamm May 31 '12 at 23:45
  • Great answer, helped me out a lot too. But what if anything NOT in these r1...r4 char lists, would be considered an invalid character, and therefore rendering the candidate var invalid, how to go about that? Summing up all preg_match_all and see if it equals the length? – Florian Mertens Jan 22 '13 at 20:57
  • @Florian Mertens: (As I went to answer your comment, I realized I had a staggeringly bad bug: in $r3, the sequence `)-_` is not what was intended!) As long as the tests are satisfied, there can be extra characters. Try it. There need to be 2 `a-z`; 2 `A-Z`; 2 'special characters' that are in the braces; 2 numbers and the length must be equal to or greater than 8. So the password `aaAA11!![[` would pass even though I did not include `[` or `]` in the definition of $r3. However, `aaAA11![[` does not pass since '[' is not included in r3 and there are not two 'special characters' – dawg Jan 23 '13 at 03:49
31

The best way to adapt that regex is to chuck it out and write some code instead. The required regex would be so long and complicated, you wouldn't be able to read it two hours after you wrote it. The equivalent PHP code will be tedious, but at least you'll be able understand what you wrote.

This isn't meant as a slam on you, by the way. Regexes are just barely suitable for password-strength validation in most cases, but your requirements are more complicated than usual, and it's just not worth it. Also, that regex you posted is crap. Never trust regexes you find floating around the web. Or any code, for that matter. Or, heck, anything. :-/

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • 3
    I agree, it's best to run separate checks not one magic regex. This way you can easily modify specific parts or add/remove them. Personally I don't think forcing a strong password is a good user experience, those JS based "password strength" bars are much better because you're both educating the user and mildly punishing them for using a bad password, but if they really want to use one, they can. – TravisO Apr 14 '10 at 14:19
  • ok ok .. you all persuade me .. I won't try the regex since it clearly is unreadable. thanks everyone! – Jason Apr 14 '10 at 14:30
  • @TravisO: Yeah, I like those password-strength bars, too. – Alan Moore Apr 14 '10 at 14:57
  • As a side note: to read complicated regex I use RegexBuddy ( http://www.regexbuddy.com/ ), it will give you a tree with good explanation of what that part is doing. You should check it out. – Radu Maris Apr 25 '12 at 15:01
16

If you really want to use a regular expression, try this:

(?=^(?:[^A-Z]*[A-Z]){2})(?=^(?:[^a-z]*[a-z]){2})(?=^(?:\D*\d){2})(?=^(?:\w*\W){2})^[A-Za-z\d\W]{8,}$

Some explanation:

  • (?=^(?:[^A-Z]*[A-Z]){2}) tests for two repetitions of [^A-Z]*[A-Z] which is a sequence of zero or more characters except uppercase letters followed by one uppercase letter
  • (?=^(?:[^a-z]*[a-z]){2}) (same as above with lowercase letters)
  • (?=^(?:\D*\d){2}) (same as above with digits)
  • (?=^(?:\w*\W){2}) (same as above with non-word characters, but you may change \W with a character class of whatever special characters you want)
  • ^[A-Za-z\d\W]{8,}$ tests the length of the whole string consisting only of character of the union of all other character classes.
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • Great answer with a nice explanation, I'm voting this up even though I don't recommend your answer. In this case I think it's becoming to unreadable, standard code would be a better choice, the only reason I would use regex because extreme performance was an issue and I needed to check millions of passwords. – TravisO Apr 14 '10 at 14:27
  • 1
    Actually, if performance were a factor, it would be another reason *not* to use a regex. Otherwise I agree: +1 for the definitive regex solution (in case you really have to go that route). – Alan Moore Apr 15 '10 at 03:04
  • +1 But instead of providing one long unreadable regex in native format (and a separate explanation), it is much better (and less work) to write the regex in free-spacing mode from the get-go, complete with proper indentation and generous comments. An additional benefit is that a verbose self-documenting free-spacing mode regex is much more maintainable in the future. – ridgerunner Sep 04 '13 at 22:36