-3

I need to find a phone number in a given paragraph text, with the conditions as below.

The word Phone/Ph/tel/telephone should exist in the sentence where the phone number is present.

For ex: (consider the below paragraph.)

This is my Phone number and I am 25 years old, 999-888-7894 and I am looking for a regex script.

As you can see this paragraph has a phone number signified, and it has the word "Phone" in the sentence (31 characters before the phone number).

So i would like to detect this as a phone number if and only if it has the words Phone/Ph/tel/telephone 50 characters before or after the phone number.

I tried using lookaround in regex but did not work.

import re

phno = re.compile(r'(?<=Ph\s)(?<=Phone\s)(?<=tel\s)telephone(?<=telephone\s)\b([0-9]{3}[-][0-9]{3}[-][0-9]{4})\b',re.MULTILINE)

data = "This is my phone number and I am 25 years old, 999-888-7894 and I am looking for a regex script."

l = phno.findall(data)

print(l)

I am getting output empty list [ ] because the word 'Phone' is not detected by regex (I need it to detect 50 chars before or after phone number)

kenlukas
  • 3,616
  • 9
  • 25
  • 36
  • 1
    Possible duplicate of [Phone validation regex](https://stackoverflow.com/questions/8634139/phone-validation-regex) – Olvin Roght Sep 10 '19 at 11:04
  • 1
    Try it like this https://regex101.com/r/Q7M0ol/1 – The fourth bird Sep 10 '19 at 11:04
  • @Thefourthbird Thanks a lot. It worked. but I am getting an empty string in the output. [('999-888-7894', '')] . We should remove that empty string, – krishna rao gadde Sep 10 '19 at 11:11
  • @krishnaraogadde That is due to the alternation `|` which will capture the value in either group 1 or group 2. One option is to check code wise if group 1 or 2 contains a value. – The fourth bird Sep 10 '19 at 11:17
  • @Thefourthbird when we have multiple phone numbers in the paragraph, this code doesn't work. can we improve it anyhow ? For ex: data = "This is my Phone number and I am 25 years old, 999-888-7894 and I am looking for a regex script. however my new Phone number is 722-818-7994. well thanks for the regex that i have got from the site stackoverflow" ---> Gives output, only one phone number [('722-818-7994', '')] where the first phone number is missing. – krishna rao gadde Sep 10 '19 at 11:29
  • 1
    @krishnaraogadde I think it works. See https://ideone.com/ZNa10H – The fourth bird Sep 10 '19 at 11:32

2 Answers2

0
import re

data = """This is my phone number and I am 25 years old, 999-888-7894 and I am looking for a regex script.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  999-123-4567 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
And 555-555-1212 is my telephone."""

phno = re.compile(r'\b(?:phone|ph|telephone)\b.{0,49}\b(\d{3}[-]\d{3}[-]\d{4})\b|\b(\d{3}[-]\d{3}[-]\d{4})\b.{0,49}\b(?:phone|ph|telephone)\b', flags=re.I)

phones = [m.group(1) if m.group(1) else m.group(2) for m in phno.finditer(data)]
print(phones)

Prints:

['999-888-7894', '555-555-1212']

See demo

Booboo
  • 38,656
  • 3
  • 37
  • 60
  • As `\b` is a zero-length assertion it's a nonsense to put it in a lookbehind. – Toto Sep 10 '19 at 12:01
  • @Toto I am making sure 'telephone' is followed by \b plus 0 to 49 more characters. If it is 0 characters, then I am sure that the phone number is preceded by \b. But if it is from 1 to 49, I can only be sure by doing the lookbehind test. It would be wrong to hardcode a \b before the phone number because that would not match `phone 555-555-5555`, which would only have one `\b` between `phone` and `555-555-5555`. Give it some thought. – Booboo Sep 10 '19 at 12:06
  • The lookbehind is superfluous as well as the non capture group, instead of `\b(?:.{0,49})(?<=\b)` use `\b.{0,49}\b` – Toto Sep 10 '19 at 12:10
  • @Toto *I* gave it some more thought and have revised my regex. Thanks. – Booboo Sep 10 '19 at 12:18
-1

Assuming you only want to detect hyphen-separated US phone numbers containing area codes, you could use the following regex pattern with re.findall:

\b\d{3}-\d{3}-\d{4}\b

Script:

sentence = "This is my Phone number and I am 25 years old, 999-888-7894 and I am looking for a regex script."
numbers = re.findall(r'\b\d{3}-\d{3}-\d{4}\b', sentence)
print(numbers)

This prints:

['999-888-7894']
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    Downvoter: It is highly improbable that anything other than a phone number would match this regex. – Tim Biegeleisen Sep 10 '19 at 11:22
  • My phone number is 867-5309 – Tim Biegeleisen Sep 10 '19 at 11:53
  • I did not downvote you, but whoever did probably did so because the main point of the question was to ensure that the phone number was within 50 characters plus or minus of ceratain keywords such as phone, telehphone, etc. The OP already knew the regex for matching a phone number. – Booboo Sep 10 '19 at 12:13