Consider this Python regex for finding phone numbers:
reg = re.compile(".*?(\(?\d{3}\D{0,3}\d{3}\D{0,3}\d{4}).*?", re.S)
The problem is that this will match any string of digits at least 10 characters in length, so I need to ensure that if there is a character preceding the regex, then it cannot be a digit.
This won't work because it breaks if the phone number is the beginning of the string:
reg = re.compile(".*?\D(\(?\d{3}\D{0,3}\d{3}\D{0,3}\d{4}).*?", re.S)
This won't work because the prior .*?
might end in a digit:
reg = re.compile(".*?[\D]?(\(?\d{3}\D{0,3}\d{3}\D{0,3}\d{4}).*?", re.S)
What does work?
EDIT:
Martijn's regex breaks on match
even though it works for search
:
>>> text = 'The Black Cat Cafe is located at 45 Main Street, Irvington NY 10533, in one of the \nRiver Towns of Westchester. ..... Our unique menu includes baked ziti pizza, \nchicken marsala pizza, margherita pizza and many more choices. ..... 914-232-2800 ...... cuisine, is located at 36 Main Street, New Paltz, NY 12561 in Ulster \nCounty.'
>>> reg = re.compile(r"(?<!\d)(\(?\d{3}\D{0,3}\d{3}\D{0,3}\d{4})(?!\d)", re.S)
>>> reg.search(text).groups()[0]
'914-232-2800'
>>> reg.match(text) is None
True
>>> reg_dotan = re.compile(".*?(\(?\d{3}\D{0,3}\d{3}\D{0,3}\d{4}).*?", re.S)
>>> reg_dotan.search(text).groups()[0]
'914-232-2800'
>>> reg_dotan.match(text) is None
False
In the application, I'm running the regex in a list comprehension:
have_phones = [d for d in descriptions if reg.match(d)]