Extract phone numbers from email using python 2.7 regex

Question

I'm trying to extract the phone numbers from many files of emails. I wrote regex code to extract them but I got the results for just one format.

PHONERX = re.compile("(\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})")

phonenumber = re.findall(PHONERX,content)

when I reviewed the data, I found there were many formats for phone numbers.

How can I extract all the phone numbers that have these format together:

800-569-0123
1-866-523-4176
(324)442-9843
(212) 332-1200
713/853-5620
713 853-0357
713 837 1749

This link is a sample for the dataset. the problem is sometime the phone numbers regex extract from the messageId and other numbers in the email https://www.dropbox.com/sh/pw2yfesim4ejncf/AADwdWpJJTuxaJTPfha38OdRa?dl=0

Possible duplicate of [A comprehensive regex for phone number validation](http://stackoverflow.com/questions/123559/a-comprehensive-regex-for-phone-number-validation) — Peter Gibson, Apr 24 '17 at 05:11

Mazdak · Answer 1 · 2017-04-24T05:40:45.110

0

You don't need to include all the possibilities using a logical OR. You can use following regex:

(?:\(\d+\)\s?\d*|\d+)([-\/ ]\d+){1,3}

see the Demo

For using with re.findall() use non-captured group:

(?:\(\d+\)\s?\d*|\d+)(?:[-\/ ]\d+){1,3}

edited Apr 24 '17 at 05:40

answered Apr 24 '17 at 05:10

Mazdak

105,000
18
159
188

I tried it but i didn't get a result for fall phone number. the result that I got look like this phonenumber=[' 14', '-7796', '-3490']) – Ash Apr 24 '17 at 05:30
@Ash That's because the `re.findall` will give you the result of captured groups. If you want to get the whole match you need to use non-captured group by adding `?:`. Check out the update. – Mazdak Apr 24 '17 at 05:39
I just apdated the question with code i'm using, I'll try yours again thank you – Ash Apr 24 '17 at 05:42
your code looks perfect as I saw in the demo but I don't why it didn't work the same with me. I think because of the dataset which are many files of emails. thank you – Ash Apr 24 '17 at 06:17

Pedro Lobito · Accepted Answer · 2017-04-24T05:20:18.983

0

You may want to use:

\(?(?:1-)?\b[2-9][0-9]{2}\)?[-. \/]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b

Which will match all your examples + ignore false positives, like:

113 837 1749
222 2222 22222

Regex Demo and Explanation

Python Demo

edited Apr 24 '17 at 05:20

answered Apr 24 '17 at 05:14

Pedro Lobito

94,083
31
258
268

Can you please define *no result*? both demos work as expected. Did you get any errors? – Pedro Lobito Apr 24 '17 at 05:28
where should I use this re.DOTALL | re.MULTILINE PHONERX = re.compile("\(?(?:1-)?\b[2-9][0-9]{2}\)?[-. /]?[2-9][0-9]{2}[-. ]?[0-9]{4}\b") phonenumber = re.findall(PHONERX,content, re.DOTALL | re.MULTILINE) – Ash Apr 24 '17 at 05:39
You can use the code from https://ideone.com/A8RQcC, it works as intended. – Pedro Lobito Apr 24 '17 at 05:44
Thank you, for you answer. it worked perfectly in your example but it didn't work with me. i dont know why. I think because i'm extracting from many files of emails – Ash Apr 24 '17 at 06:03
I may know what's going on, can you post a sample of the email's source? Please also include the headers. On which telephone formar did your regex work? – Pedro Lobito Apr 24 '17 at 10:35
ok I'll update it now, my old regex extract from the messageID number too from the header – Ash Apr 26 '17 at 03:20
I just updated the question and I added the sample in the link – Ash Apr 26 '17 at 03:25
I've tested my regex with the new email source and it works as expected, it only extracts the phone #'s. – Pedro Lobito Apr 26 '17 at 08:21
Yeah and I tested in the regex demo it worked but it didn't work with me when I tried with all data. I just update the link with more than 5 email files. if you can try it works with more than one email. thanks a lot – Ash Apr 26 '17 at 10:17

Extract phone numbers from email using python 2.7 regex

2 Answers2