How to accept an ascii character with python re (regex)

Question

I have a regex that controls a password so that it contains an upper case, a lower case, a number, a special character and minimum 8 characters.

regex is:

regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[\W]).{8,}$"

I use in this function:

def password_validator(password):
    #REGEX PASSWORD : minimum 8 characters, 1 lowercase, 1 uppercase, 1 special caracter
    regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[\W]).{8,}$"

    if not re.match(regex_password, password):
        raise ValueError("""value is not a valid password""")
    return password

However, the use of "²" raises me an error, however, this same regex with a Javascript front-end validation, or on different regex validation site,works.

The problem is possible the ascii, so how can i do for python accept the ascii character in regex ?

For what it's worth, `\W` is already a character class, so the square brackets around it are superfluous. — tripleee, Dec 20 '22 at 11:15
The `²` char is from the `\p{No}` class. Most regex engines only match `\p{Nd}` with `\w`, but not in Python. Hence, `\W` does not match superscript numbers in `re`. In JavaScript, `\w` only matches `[a-zA-Z0-9_]`, so in Python 3, you can make the `\w` ASCII only aware with `re.A`. — Wiktor Stribiżew, Dec 20 '22 at 11:16
You could try `[\W²]` to include it or alternatively, you can use the `\S` character set, which matches any non-whitespace character, to include the `²` character. — Kylar, Dec 20 '22 at 11:16

score 2 · Accepted Answer · answered Dec 20 '22 at 11:17

From the documentation:

\W

Matches any character which is not a word character. This is the opposite of \w. If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_]. If the LOCALE flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore.

Other implementations may interpret \w as referring to only ASCII alphanumeric characters and underscore by default and \W by extension contains every non-ASCII alphanumeric characters as well as non-alphanumeric characters.

Possible solutions:

Spell it out:

regex_password = r"^(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z0-9_]).{8,}$"

Or use the re.ASCII flag:

if not re.match(regex_password, password, flags=re.ASCII):

Either one of these changes should give you the results you need.

How to accept an ascii character with python re (regex)

1 Answers1