3

I am trying to use the re.UNICODE flag to match a string potentially containing unicode characters, but it doesn't seem to be working. E.g.:

Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u"test test test", re.UNICODE)
[]

It works if I do not specify the unicode flag, but then obviously it will not work with unicode strings. What do I need to do to get this working?

faiuwle
  • 359
  • 1
  • 3
  • 10

1 Answers1

6

The second argument to r.findall is not flags, but pos. You don't need to specify flags again when you already specified them in compile.

>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u'test test test')
[u'test', u'test', u'test']
khelwood
  • 55,782
  • 14
  • 81
  • 108