It's to validate username, my codes:
import re
regex = r'^[\w.@+-]+\Z'
result = re.match(regex,'名字')
In python2.7, it returns None
.
In python3.7, it returns '名字'.
It's to validate username, my codes:
import re
regex = r'^[\w.@+-]+\Z'
result = re.match(regex,'名字')
In python2.7, it returns None
.
In python3.7, it returns '名字'.
It's because of the different definitions for \w
in Python 2.7
versus Python 3.7
.
In Python 2.7
, we have:
When the LOCALE and
UNICODE
flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set[a-zA-Z0-9_]
.
(emphasis and hyperlink and formatting added)
However, in Python 3.7
, we have:
For Unicode (str) patterns: Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only
[a-zA-Z0-9_]
is matched.
(emphasis and formatting added)
So, if you want it to work in both versions, you can do something like this:
# -*- coding: utf-8 -*-
import re
regex = re.compile(r'^[\w.@+-]+\Z', re.UNICODE)
match = regex.match(u'名字')
if match:
print(match.group(0))
else:
print("not matched!")
output:
名字
Here's proof that it works in both versions:
Note the differences:
I added # -*- coding: utf-8 -*-
at the top of the script, because without it, in Python 2.7
, we'll get an error saying
Non-ASCII character '\xe5' on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
Instead of using result = re.match(pattern, string)
, I used regex = re.compile(pattern, flags)
and match = regex.match(string)
so that I can specify flags.
I used re.UNICODE
flag, because without it, in Python 2.7
, it will only match [a-zA-Z0-9_]
when using \w
.
I used u'名字'
instead of '名字'
, because in Python 2.7
you need to use Unicode Literals for unicode characters.
Also, while answering your question, I found out that print("not matched!")
works in Python 2.7
as well, which makes sense, because in this case the parentheses are ignored, which I didn't know, so that was fun.