1

I can't understand how to get phone number from html by regex. I check my regex here, it works and must get the number from this link

I try to parse like that:

import requests
import re

url = 'https://a101.ru'
r = requests.get(url)
html = r.text
result = re.findall('((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}', html)
print(result)

And get this: 
[(u'', u'', u''), (u'', u'', u'').....(u'+7 ', u'+7', u'(495) ')....(u'', u'', u'')]
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
2manov
  • 67
  • 5

1 Answers1

1

You could use the regex to spot the tel: part of the href

import re
import requests

r = requests.get('https://a101.ru', verify=False)
print re.findall(r'tel:(.*?)">', r.text)

For that page it would spot 4 matches:

['+7(495)221-40-21', '+7(495)221-40-21', '+7(495)221-40-21', '+7(495)221-40-21']

Normally I would use BeautifulSoup to parse the file correctly and extract the information, but for very specific minor uses, regex could be used with care.


You can obtain the same results with BeautifulSoup as follows:

from bs4 import BeautifulSoup
import requests
import re

r = requests.get('https://a101.ru', verify=False)
soup = BeautifulSoup(r.content, "html.parser")
print([tel['href'][4:] for tel in soup.find_all('a', href=re.compile(r'tel:'))])
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • Can i use this regex ('tel:(.*?)">') to all html's to take phone numbers? – 2manov Feb 05 '19 at 15:57
  • 1
    The `tel:` prefix is used by phones to actually dial the number, not all websites add it. If it is used, you can be fairly certain that it is a valid number. – Martin Evans Feb 05 '19 at 15:59