26

I saw the phrase

^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9_#@%\*\-]{8,24}$

in regex, which was password checking mechanism. I read few courses about regular expressions, but I never saw combination ?=. explained.

I want know how it works. In the example it is searching for at least one capital letter, one small letter and one number. I guess it's something like "if".

The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
Izzy
  • 755
  • 2
  • 9
  • 17
  • 6
    [Positive lookahead](http://www.regular-expressions.info/lookaround.html) – devnull Mar 16 '14 at 15:35
  • 1
    If you want to know what regular expression characters mean, enter the regexp at regexr.com. Then hover the mouse over the characters and it will display the meaning in a tooltip. – Barmar Mar 16 '14 at 15:37
  • 2
    You might also find an explanation at http://regex101.com/ – devnull Mar 16 '14 at 15:38
  • 3
    Note in particular that the `.` is unrelated to the `(?=`. Your regex starts with `(?=` (ensure that you can see, but don't consume) followed by `.*` (zero or more of any character). – Phrogz Mar 16 '14 at 15:38
  • 1
    Possible duplicate of [What does ?= mean in a regular expression?](https://stackoverflow.com/questions/1570896/what-does-mean-in-a-regular-expression) – Ruslan Jun 10 '17 at 07:06

3 Answers3

30

(?=regex_here) is a positive lookahead. It is a zero-width assertion, meaning that it matches a location that is followed by the regex contained within (?= and ). To quote from the linked page:

lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called "assertions". They do not consume characters in the string, but only assert whether a match is possible or not. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them.

The . is not part of the lookahead, because it matches any single character that is not a line terminator.

The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
4

Although i am a newbie to regex but what i understand about the above regex is

1- ?= is positive lookahead i.e. it matches the expression by looking ahead and sees if there is any pattern that matches your search paramater like [A-Z]

2- .* makes sure that they can be 0 or more number of characters before your matching expression i.e. it makes sure that u can lookahead till the end of the input string to find a match. In short * is a quantifier which says 0 or more so if:

For instance u changed * with ? for [A-Z] part then your expression will only return true if ur 1st or 2nd letter is capital. OR if u changed it with + then ur expression will return true if any letter other than the first is a capital letter

1

^ asserts position at start of the string Positive Lookahead (?=\D*\d) Assert that the Regex below matches \D matches any character that's not a digit (equivalent to [^0-9])

  • matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) \d matches a digit (equivalent to [0-9]) Positive Lookahead (?=[^a-z]*[a-z]) Assert that the Regex below matches Match a single character not present in the list below [^a-z]
  • matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive) Match a single character present in the list below [a-z] a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive) Positive Lookahead (?=[^A-Z]*[A-Z]) Assert that the Regex below matches Match a single character not present in the list below [^A-Z]
  • matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) Match a single character present in the list below [A-Z] A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive) . matches any character (except for line terminators) {8,30} matches the previous token between 8 and 30 times, as many times as possible, giving back as needed (greedy) $ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)