3

I am just writing a small regex to filter email from string. When I am using pattern as patt = r'[\w.-]+@[\w.-]+', it's working fine. But when I am using pattern as patt1 = r'[\w-.]+@[\w-.]+', its giving me error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
    return _compile(pattern, flags).search(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

Code:

1st case:

>>> str = "hello@abc.com"
>>> patt = r'[\w.-]+@[\w.-]+'
>>> match = re.search(patt, str)
>>> match.group()
'hello@abc.com'

2nd case:

>>> str = "hello@abc.com"
>>> patt = r'[\w-.]+@[\w-.]+'
>>> match = re.search(patt, str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
    return _compile(pattern, flags).search(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

Any idea what I am doing wrong in the second regex?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Harsh Sharma
  • 10,942
  • 2
  • 18
  • 29
  • Problem is probably because of the hyphen present inside `\w-` character class, so python interprets it as a character interval. – 0xInfection Feb 24 '19 at 07:28

2 Answers2

2

Hyphens - need to be the first or last in the list. They have special meaning when used between two characters to indicate a range, like [A-Z] (all uppercase letters). When a hyphen is at the end or beginning, it has not special meaning.

Also, escaping it \- will work in Python, but beware it may not in other implementations/languages.

Check out the accepted answer here: Regex - Should hyphens be escaped?

Not the exact same question as yours, but touches on similar information.

Nick
  • 7,103
  • 2
  • 21
  • 43
1

The dash (-) inside a character class ([]) indicates a character range i.e. from-to. So, if you want to use literal -, you have 3 options:

  • put - at the start: [-foo]
  • put - at the end: [foo-]
  • escape - with \: [foo\-bar]
heemayl
  • 39,294
  • 7
  • 70
  • 76