6

I'm trying to learn a more advanced regular expressions for a password validator I'm working on because I think using regular expressions would be the best way out. I am using Java as my programming language

So for my pattern people suggested this (?=.*?[A-Z]) as to say "at least one upper case in the string". I have tried searching it at least but nothing seems to make it clear ?=.*? how this part makes sure it at least there.

here is the whole pattern ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!@$%^&*-]).{8,}$

from what i understand

  1. ? means optional and occurs once
  2. = means well i don't know yet
  3. . is a wildcard
  4. [A-Z] is the range of uppercase letters from A-Z

TLDR: So my question is how does this (?=.*?[A-Z]) make it sure atleast one uppercase letter is included? Any in-depth explanation?

SirPent
  • 77
  • 1
  • 2
  • 6
  • http://www.rexegg.com/regex-lookarounds.html This may help. And actually explains some pswd validation with regex – quackenator May 11 '17 at 15:28

2 Answers2

10
  • (?= is the start of a look-ahead group — the question mark does not mean the same as a ? elsewhere
  • .*? is a non-greedy match against anything or nothing. The question-mark here also does not mean 'optional'.
  • [A-Z] is a character set containing the upper case ASCII letters A through to Z.
  • ) is the end of the look-ahead group

So the net result is:

"Look ahead and see if, after maybe some characters, there is an upper case letter."

Your full expression, ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!@$%^&*-]).{8,}$, can be read as:

"Match if the string contains an upper case letter, and a lower case letter, and a digit, and a non-alphanumeric, and there are at least 8 characters in total."

Nicholas Shanks
  • 10,623
  • 4
  • 56
  • 80
  • 3
    I understand the groupings of the ?= lookahead part but `.*?` is quite unclear still to me.. is this the part the makes it mandatory? – SirPent May 11 '17 at 15:33
  • if the character class `[A-Z]` fails to match then there is no uppercase character in the string. `.*?` matches any character, between zero and unlimited times, as few times as possible, expanding as needed (lazy). It is needed so the pattern can expand until it finds an uppercase char or fails – quackenator May 11 '17 at 15:37
  • 2
    `.*?` means "ignore zero or more of any character, trying to ignore as few as possible whilst still matching the rest of this expression". You have to view it in the context of the rest of the group, the `[A-Z]`. If you're matching against the string "F", then `.*?` will match the empty string, "", and your regex will succeed. If matching against "FG", it will also match the empty string, and the [A-Z] will match the F. Compare against `.*`, which would instead match the F itself, and [A-Z] would match the "G". – Nicholas Shanks May 11 '17 at 15:37
  • Oh man Thank you. Now this clarifies everything I need. – SirPent May 11 '17 at 15:37
  • Input = 'aaaHelpmeHellpaaa' \n, if regex 'H.*p' then you will get 'HelpmeHellp' <- greedy if regex 'H.*?p' then you will get 'Help' and 'Hellp' <- not greedy – abduljalil Sep 15 '21 at 08:39
3

The regex is using a feature named positive lookahead, this is part of the regex lookarounds:

  • Positive lookahead: (?=...). Ex: a(?=b) matches a if followed by b
  • Negative lookahead: (?!...). Ex: a(?!b) matches a if not followed by b
  • Positive lookbehind: (?<=...). Ex: (?<=a)b matches b if preceded by a
  • Negative lookbehind: (?<!...). Ex: (?<=a)b matches b if not preceded by a

For your whole regex, you can see easily your pattern with this diagram:

enter image description here Diagram link

Related to (?=.*?[A-Z]), it is being used after the ^. So, ^(?=.*?[A-Z])$ means match a line that start and end with whatever thing but having a uppercase character at the end

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123