3

It's to validate username, my codes:

import re
regex = r'^[\w.@+-]+\Z'
result = re.match(regex,'名字')

In python2.7, it returns None.

In python3.7, it returns '名字'.

Amir Shabani
  • 3,857
  • 6
  • 30
  • 67
qxang
  • 318
  • 3
  • 10

1 Answers1

4

It's because of the different definitions for \w in Python 2.7 versus Python 3.7.

In Python 2.7, we have:

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_].

(emphasis and hyperlink and formatting added)

However, in Python 3.7, we have:

For Unicode (str) patterns: Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched.

(emphasis and formatting added)

So, if you want it to work in both versions, you can do something like this:

# -*- coding: utf-8 -*-
import re
regex = re.compile(r'^[\w.@+-]+\Z', re.UNICODE)
match = regex.match(u'名字')

if match:
    print(match.group(0))
else:
    print("not matched!")

output:
名字

Here's proof that it works in both versions:

works

Note the differences:

  • I added # -*- coding: utf-8 -*- at the top of the script, because without it, in Python 2.7, we'll get an error saying

    Non-ASCII character '\xe5' on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

  • Instead of using result = re.match(pattern, string), I used regex = re.compile(pattern, flags) and match = regex.match(string) so that I can specify flags.

  • I used re.UNICODE flag, because without it, in Python 2.7, it will only match [a-zA-Z0-9_] when using \w.

  • I used u'名字' instead of '名字', because in Python 2.7 you need to use Unicode Literals for unicode characters.

Also, while answering your question, I found out that print("not matched!") works in Python 2.7 as well, which makes sense, because in this case the parentheses are ignored, which I didn't know, so that was fun.

Amir Shabani
  • 3,857
  • 6
  • 30
  • 67