How do I use re.UNICODE in python 2.7?

Question

I am trying to use the re.UNICODE flag to match a string potentially containing unicode characters, but it doesn't seem to be working. E.g.:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u"test test test", re.UNICODE)
[]

It works if I do not specify the unicode flag, but then obviously it will not work with unicode strings. What do I need to do to get this working?

Are you getting an error? If so, please give the entire error message by editing your post. — rst-2cv, May 27 '18 at 15:18
No error, it just returns an empty list. I copied directly from the interpreter. — faiuwle, May 27 '18 at 15:21
Have you tried decoding the strings into ascii and then matching with the regex that works? — rst-2cv, May 27 '18 at 15:25
Ok, I think I understand. The flag doesn't go with the findall when you use it with the compiled object, it only goes to the initial compile function. That works. — faiuwle, May 27 '18 at 15:29

khelwood · Accepted Answer · 2018-05-27T15:31:55.320

6

The second argument to r.findall is not flags, but pos. You don't need to specify flags again when you already specified them in compile.

>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u'test test test')
[u'test', u'test', u'test']

edited May 27 '18 at 15:31

answered May 27 '18 at 15:28

khelwood

55,782
14
81
108

How do I use re.UNICODE in python 2.7?

1 Answers1

Linked