2

I'm trying to get regex for minimum requirements of a password to be minimum of 6 characters; 1 uppercase, 1 lowercase, and 1 number. Seems easy enough? I have not had any experience in regex's that "look ahead", so I would just do:

if(!pwStr.match(/[A-Z]+/) || !pwStr.match(/[a-z]+/) || !pwStr.match(/[0-9]+/) ||
    pwStr.length < 6)
    //was not successful

But I'd like to optimize this to one regex and level up my regex skillz in the process.

Nick Rolando
  • 25,879
  • 13
  • 79
  • 119
  • 5
    "optimize"? A regular expression which is run once or twice per user? Premature optimization is the root of all evil - one of the most important things to learn as a programmer is that making code more complex imposes a cost on anyone who must maintain it. – Borealid Mar 08 '12 at 00:07
  • Try this - http://stackoverflow.com/questions/7844359/password-regex-with-min-6-chars-at-least-one-letter-and-one-number-and-may-cont ; Hope it help. function checkPwd(str) { if (str.length < 6) { return("too_short"); } else if (str.length > 50) { return("too_long"); } else if (str.search(/\d/) == -1) { return("no_num"); } else if (str.search(/[a-zA-Z]/) == -1) { return("no_letter"); } else if (str.search(/[^a-zA-Z0-9\!\@\#\$\%\^\&\*\(\)\_\+]/) != -1) { return("bad_char"); } return("ok"); } – Tats_innit Mar 08 '12 at 00:08
  • If you _really_ want to get regex _skills_, sit down and read [Mastering Regular Expressions (3rd Edition)](http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 "By Jeffrey Friedl. Best book on Regex - ever!"). In the meantime, you can get off to a very good start by following the tutorial at: [www.regular-expressions.info](http://www.regular-expressions.info/) – ridgerunner Mar 08 '12 at 01:36
  • RegEx is powerful. Absolutely you should optimize it. Also learning to use it well is hugely beneficial. In JavaScript // indicates a regEx literal. You can save it directly to vars the same way you would a [] or a {}. Just name that var well because it's definitely not always easy to interpret another devs lengthy RegEx forumula. In RegEx, the key to performance tends to be in being highly explicit. The less verbose RegEx statements are rarely the most efficient. – Erik Reppen Mar 08 '12 at 06:34
  • More importantly, I have 4 cats. I haven't actually tried copying and pasting that string of '************' on my login screen in the morning to see how long it actually gets but it takes a second to ctrl+a and delete sometimes. Sounds silly for just a password but I'd be annoyed if an app tanked because somebody wasn't considerate enough to anchor their regEx to avoid parsing the entirety of kitty-Tolstoy novels in their password fields. – Erik Reppen Mar 08 '12 at 07:23

4 Answers4

6
^.*(?=.{6,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!&$%&? "]).*$
  • ^.*
    Start of Regex
  • (?=.{6,})
    Passwords will contain at least 6 characters in length
  • (?=.*[a-zA-Z])
    Passwords will contain at least 1 upper and 1 lower case letter
  • (?=.*\d)
    Passwords will contain at least 1 number
  • (?=.*[!#$%&? "]) Passwords will contain at least given special characters
  • .*$
    End of Regex

here is the website that you can check this regex - http://rubular.com/

AMIC MING
  • 6,306
  • 6
  • 46
  • 62
  • This regex would match the string `xxxxxxxxxxxxAa$0` which is longer than 10 characters - see http://rubular.com/r/9N7X3r4HdP – Gareth Mar 08 '12 at 00:12
  • and if you do anchor it then it becomes very nontrivial if you wanted to change the requirement to e.g. '2 special characters'. To cut a long story short, don't try to use one regular expression for this – Gareth Mar 08 '12 at 00:16
  • Thanks Amit. Could you help me understand this, or provide a resource that will help? For example, what does `(?=^.{6,}$)` mean? – Nick Rolando Mar 08 '12 at 00:39
  • @Shredder - I just the update the answer, please check and I also change the regex, previously I forgot the add special character validations – AMIC MING Mar 08 '12 at 01:14
  • 2
    Oops. Sorry. Thought I edited that. [a-zA-Z] is like saying anything a-z OR A-Z but he wants one of each. They have to be separate lookaheads. – Erik Reppen Mar 08 '12 at 06:36
3

Assuming that a password may consist of any characters, have a minimum length of at least six characters and must contain at least one upper case letter and one lower case letter and one decimal digit, here's the one I'd recommend: (commented version using python syntax)

re_pwd_valid = re.compile("""
    # Validate password 6 char min with one upper, lower and number.
    ^                 # Anchor to start of string.
    (?=[^A-Z]*[A-Z])  # Assert at least one upper case letter.
    (?=[^a-z]*[a-z])  # Assert at least one lower case letter.
    (?=[^0-9]*[0-9])  # Assert at least one decimal digit.
    .{6,}             # Match password with at least 6 chars
    $                 # Anchor to end of string.
    """, re.VERBOSE)

Here it is in JavaScript:

re_pwd_valid = /^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9]*[0-9]).{6,}$/;

Additional: If you ever need to require more than one of the required chars, take a look at my answer to a similar password validation question

Edit: Changed the lazy dot star to greedy char classes. Thanks Erik Reppen - nice optimization!

Community
  • 1
  • 1
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • I thought he set a max of 10 before but I don't see it anymore. If there were a max, another optimization would e to set the * to {0,-1} I think the lookaheads ignore your '$' since they're more like an extra hunk of logic tossed in that starts at the current position but resets the parser after they've matched. They don't care about the stuff your remaining regEx bits care about. – Erik Reppen Mar 08 '12 at 07:05
2

My experience is if you can separate out Regexes, the better the code will read. You could combine the regexes with positive lookaheads (which I see was just done), but... why?

Edit:

Ok, ok, so if you have some configuration file where you could pass string to compile into a regex (which I've seen done and have done before) I guess it is worth the hassle. But otherwise, Even if the answers provided are corrected to match what you need, I'd still advise against it unless you intend to create such a thing. Separate regexes are just so much nicer to deal with.

JayC
  • 7,053
  • 2
  • 25
  • 41
1

I haven't tested thoroughly but here's a more efficient version of Amit's. I think his also allowed unspecified characters into the mix (which wasn't technically listed as a rule). This one won't go berserk on you if you accidentally target a large hunk of text, it will fail sooner on strings that are too long and it only allows the characters in the final class.

'.' should be used sparingly. Think of the looping it has to do to determine a match with all the characters it can represent. It's much more efficient to use negating classes.

`^(?=[^0-9]{0,9}[0-9])(?=[^a-z]{0,9}[a-z])(?=[^A-Z]{0,9}[A-Z])(?=[^@#$%]{0,9}[@#$%])[0-9a-zA-Z@#$%]{6,10`}$

There's nothing wrong with trying to find the ideal regEx. But split it up when you need to.

RegEx tends to be explained poorly. I'll add a breakdown:

a - a single 'a' character
ab - a single 'a' character followed by a single b character
a* - 0 or more 'a' characters
a+ - one or more 'a' characters
a+b - one or any number of a characters followed by a single b character.
a{6,} - at least 6 'a' characters (would match more)
a{6,10} - 6-10 'a' characters
a{10} - exactly 10 'a' characters iirc - not very useful

^ - beginning of a string - so ^a+ would not math 'baaaa'
$ - end of a string - b$ would not find a match 'aaaba'

[] signifies a character class. You can put a variety of characters inside it and every character will be checked. By itself only whatever string character you happen to be on is matched against. It can be modified by + and * as above.

[ab]+c - one or any number of a or b characters followed by a single c character
[a-zA-Z0-9] - any letter, any number - there are a bunch of \<some key> characters representing sets like \d for 'digits' I'm guessing. \w iirc is basically [a-zA-Z_]

note: '\' is the escape key for character classes. [a\-z] for 'a' or '-' or 'z' rather than anything from a to z which is what [a-z] means

[^<stuff>] a character class with the caret in front means everything but the characters or <stuff> listed - this is critical to performance in regEx matches hitting large strings.

. - wildcard character representing most characters (exceptions are a handful of really old-school whitespace characters). Not a big deal in very small sets of characters but avoid using it.

(?=<regex stuff>) - a lookahead. Doesn't move the parser further down the string if it matches. If a lookahead fails, the whole match fails. If it succeeds, you go back to the same character before it. That's why we can string a bunch together to search if there's at least one of a given character.

So:

^ - at the beginning followed by whatever is next

(?=[^0-9]{0,9}[0-9]) - look for a digit from 0-9 preceded by up to 9 or 0 instances of anything that isn't 0-9 - next lookahead starts at the same place

etc. on the lookaheads

[0-9a-zA-Z@#$%]{6,10} - 6-10 of any letter, number, or @#$% characters

No '$' is needed because I've limited everything to 10 characters anyway

Erik Reppen
  • 4,605
  • 1
  • 22
  • 26
  • +1 for the _don't use the dot unless you have to_ recommend and the associated optimization. Note that if you wish to reject strings greater than 10, your expression still needs a trailing `$` or `\z` anchor at the end. Also you have an extraneous `E` where a `#` should be in the fourth lookahead assertion. Thanks for the tip! – ridgerunner Mar 08 '12 at 02:07
  • Thanks. I made another mistake, which was to not anchor each lookahead. – Erik Reppen Mar 08 '12 at 05:16
  • Actually I think was right the first time. Anchoring the start of the string does the trick since the lookaheads use your current position as a ref point. – Erik Reppen Mar 08 '12 at 05:36
  • @ridgeRunner - I don't think the '$' is necessary since I've anchored to the beginning and nothing should parse beyond 10 chars. – Erik Reppen Mar 08 '12 at 06:40
  • Yes, your expression correctly matches up to 10 chars max, but it fails to check that there are no chars beyond that. This is why the `$` is required at the end - go ahead and test it. You do test your regexes right? I recommend using [RegexBuddy](http://www.regexbuddy.com/) for composing, testing and debugging regular expressions. Cheers! – ridgerunner Mar 08 '12 at 14:38