4

I'm having trouble matching the underscore character in Python using regular expressions. Just playing around in the shell, I get:

>>> import re
>>> re.match(r'a', 'abc')
<_sre.SRE_Match object at 0xb746a368>
>>> re.match(r'_', 'ab_c')
>>> re.match(r'[_]', 'ab_c')
>>> re.match(r'\_', 'ab_c')

I would have expected at least one of these to return a match object. Am I doing something wrong?

Joshua
  • 40,822
  • 8
  • 72
  • 132
scottmsul
  • 119
  • 1
  • 2
  • 5

2 Answers2

6

Use re.search instead of re.match if the pattern you are looking for is not at the start of the search string.

re.match(pattern, string, flags=0)

Try to apply the pattern at the start of the string, returning a match object, or None if no match was found.

re.search(pattern, string, flags=0)

Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

You don't need to escape _ or even use raw string.

>>> re.search('_', 'ab_c')
Out[4]: <_sre.SRE_Match object; span=(2, 3), match='_'>
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
4

Try the following:

re.search(r'\_', 'ab_c')

You were indeed right to escape the underscore character! Mind that you can only use match for the beginning of strings, as is also clear from the documentation (https://docs.python.org/2/library/re.html):

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

You should use search in this case:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

Zafi
  • 619
  • 6
  • 16