-1

According to an online tutorial which I am reading, it is stated that:

Optional Matching with the Question Mark

"Sometimes there is a pattern that you want to match only optionally. That is, the regex should find a match whether or not that bit of text is there. The ? character flags the group that precedes it as an optional part of the pattern. For example, enter the following into the interactive shell:"

>>> batRegex = re.compile(r'Bat(wo)?man')
>>> mo1 = batRegex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'

My Problem:

I am trying to find matching phone numbers, in the form of either 123-456-7890 (without country code) or (111)-123-456-7890 (with country code).

Here is my regex code for python to return a list of matching phone numbers:

phone_num_regex = re.compile(r'(\(\d{3}\)-)?\d{3}-\d{3}-\d{4}')
phone_num_list = phone_num_regex.findall('800-420-7240 (933)-415-863-9900 415-863-9950')

However, the phone_num_list I obtained is ['', '(933)-', ''] instead of what I wanted which is ['800-420-7240, '(933)-415-863-9900', '415-863-9950'].

May I know what is wrong with my code? I'm guessing it's something to do with the '?' (optional matches)

Brian
  • 33
  • 1
  • 6
  • Try to make the first a non capturing group `(?:\(\d{3}\)-)?\d{3}-\d{3}-\d{4}` – The fourth bird Nov 29 '18 at 16:47
  • Note that You should generally prefer to split this into groups of phone numbers first (just `str.split` should do this) and then match on each number. That lets you get a little more creative with your final regex. – Adam Smith Nov 29 '18 at 16:54

1 Answers1

2

You're including your optional sections in capture groups, which means that all re.findall gives you is those groups.

If you use non-capturing groups instead, this won't happen.

re.compile(r'(?:\(\d{3}\)-)?\d{3}-\d{3}-\d{4}')

From the docs:

(...)
Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below. To match the literals '(' or ')', use \( or \), or enclose them inside a character class: [(], [)].

 

(?:...)
A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

 

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

(emphasis mine)

Adam Smith
  • 52,157
  • 12
  • 73
  • 112