0

I'm trying to extract phone numbers from some text

the problem is im getting 4 different matches, where I only want the full match of this particular expression. for example, I get:

Match 1
1.  054-434-4321
2.  054
3.  -
4.  -

Match 2
1.  (03) 502 9571
2.  (03)
3.  
4.  

as you can see, I only need the first match out of this list.

here is my code :

text = "You can reach me at 054-434-4321, or my office at (03) 502 9571 or (050) 223 957.\ 
Send me a fax at 03 502 7422. We finally made the sale for all 977 giraffes.\
They wanted 225 957 dollars for it"

phone_pattern = re.compile(r'(\d{2,3}|\(\d{2,3}\))(-| )\d{3}(-| )\d{3,4})')
phone_results = phone_pattern.findall(text)
print(f'extracted {len(phone_results)} results : {phone_results}')

This is the regex :

(\d{2,3}|\(\d{2,3}\))(-| )\d{3}(-| )\d{3,4})

I've tried to place the parentheses at the end of the expression, in order to group results, with no aid.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
NyaSol
  • 537
  • 1
  • 5
  • 21

2 Answers2

0

Make the subgroups non-capturing with ?:.

import re

text = """
You can reach me at 054-434-4321, or my office at (03) 502 9571 or (050) 223 957.
Send me a fax at 03 502 7422. We finally made the sale for all 977 giraffes.
They wanted 225 957 dollars for it.
"""

phone_pattern = re.compile(r'(?:\d{2,3}|\(\d{2,3}\))(?:-| )\d{3}(?:-| )\d{3,4}')

for result in phone_pattern.findall(text):
    print(result)

outputs

054-434-4321
(03) 502 9571
(050) 223 957
03 502 7422
AKX
  • 152,115
  • 15
  • 115
  • 172
0

Simply as:

import re

text = """You can reach me at 054-434-4321, or my office at (03) 502 9571 or (050) 223 957.\ 
Send me a fax at 03 502 7422. We finally made the sale for all 977 giraffes.\
They wanted 225 957 dollars for it"""

tel_number = re.findall('\d+-\d+-\d+|\(\d+\)\s\d+\s\d+|\d+\s\d+\s\d+', text)

Output:

['054-434-4321', '(03) 502 9571', '(050) 223 957', '03 502 7422']
Zaraki Kenpachi
  • 5,510
  • 2
  • 15
  • 38