1

What should be the appropriate regular expression to capture all the phone numbers listed below? I tried with one and it partially does the work. However, I would like to get them all. Thanks for any suggestion or help.

Here are the numbers along with my script I tried with:

import re

content='''  
415-555-1234
650-555-2345
(416)555-3456
202 555 4567
4035555678
1 416 555 9292
+1 416 555 9292
'''
for phone in re.findall(r'\+?1?\s?\(?\d*\)?[\s-]\d*[\s-]\d*',content):
  print(phone)

The result I'm getting is:

415
-555-1234

650-555-2345
555-3456
202
 555 4567
4035555678

1 416 555
 9292

+1 416 555 9292
SIM
  • 21,997
  • 5
  • 37
  • 109
  • Try `r'\+?1?\s?(?:\(?\d+\)?)?[\s-]?\d+(?:[\s-]?\d+)?'` – Wiktor Stribiżew Nov 29 '17 at 17:35
  • Will you be searching for these numbers in a larger, "noisier" context or is the example of `content` similar to your real-world scenario? – Jordan Bonitatis Nov 29 '17 at 17:43
  • 1
    Also, see a [multiline demo](https://regex101.com/r/UN75Mr/1). – Wiktor Stribiżew Nov 29 '17 at 17:46
  • Does the answer to your question has anything to do with creating an appropriate expression @Jordan Bonitatis? – SIM Nov 29 '17 at 17:49
  • If your context is exactly like above, I would suggest not using an expression at all and simply split `content` on `\n`, then strip non-numerals, leading 1's, etc. to normalize it. At that point, you can format it any way you like – Jordan Bonitatis Nov 29 '17 at 17:52
  • Thanks for your answer @ Wiktor Stribiżew. It is matching the bracketed portion as well. However, the only question remains is why the `-` separated numbers are being scraped as broken part, as in `415-555`,`-1234`? Thanks. – SIM Nov 29 '17 at 17:52
  • Thanks for your suggestion @Jordan Bonitatis. I know that string manipulation is better option in such cases but I'm a beginner in regular expression so I'm trying my best to learn slightly complicated stuffs. Thanks again. – SIM Nov 29 '17 at 17:58
  • I see - I would suggest taking a look at this previous [SO Question](https://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) – Jordan Bonitatis Nov 29 '17 at 17:59
  • Yes, this is it. The link to the `multiline demo` is what I expected to have. Would you care to provide it as an answer so that I can accept it. Thanks a trillion @Wiktor Stribiżew. – SIM Nov 29 '17 at 18:03
  • @Topto Ok, added with explanations. I hope the regex details will help you tweak it further. – Wiktor Stribiżew Nov 29 '17 at 18:11

2 Answers2

3

I suggest to make some parts of the regex obligatory (like the digit patterns, by replacing * with +) or it might match meaningless parts of texts. Also, note that \s matches any whitespace, while you most probably want to match strings on the same lines.

You might try

\+?1? ?(?:\(?\d+\)?)?(?:[ -]?\d+){1,2}

See the regex demo

Details

  • \+? - an optional plus
  • 1? - an optional 1
  • ? - and optional space
  • (?:\(?\d+\)?)? - an optional sequence of a (, then 1+ digits and then an optional )
  • (?:[ -]?\d+){1,2} - 1 or 2 occurrences of:
    • [ -]? - an optional space or -
    • \d+ - 1+ digits
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

I thinks this regx will work in your case

import re
content = '''  
    415-555-1234
    650-555-2345
    (416)555-3456
    202 555 4567
    4035555678
    1 416 555 9292
    +1 416 555 9292
    '''
    for phone in re.findall(r'(([+]?\d\s\d?)?\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})', content):
        print phone[0]
Nazir Ahmed
  • 615
  • 4
  • 14
  • 29