I'm trying to create a scraper for olx website (www.olx.pl) using requests and beautifulsoup. I don't have any problems with most of data, but the phone number is hidden (One has to first click it). I've already tried to use chrome inspect to see what is happening in the "Network" tab when I click it manually. There is an ajax request with this information "?pt=5d1480fbad0a1f2006e865bfdf7a6fb07f244b82e17ab0ea4c5eaddc43f9da391b098e1926642564ffb781655d55be270c6913f7526a08298f43b24c0169636b" This is the phoneToken which may be found in the website source (it changes on each page load). I tried to send this kind of request using requests library, but I got "000 000 000" in response. I can get the phone number using Selenium, but it is so slow to load.
The question is: Is there a way to get around those security phone tokens? or How to speed up Selenium to scrape phone number in let's say 1-2sec?
Ad example: https://www.olx.pl/561666735
EDIT: Actually, now in response I get the message that my IP address is blocked. (But only using requests, ip is not blocked when I load page manually). Unfortunately I made some changes and I can't reproduce the code, to get '000 000 000' in response. This is part of my code right now.
def scrape_phone(id):
s = requests.Session()
url = "https://www.olx.pl/{}".format(id)
response = s.get(url, headers=headers)
page_text = response.text
# getting short id
index_of_short_id = page_text.index("'id':'")
short_id = page_text[index_of_short_id:index_of_short_id+11].split("'")[-1]
# getting phone token
index_of_token = page_text.index("phoneToken")
phone_token = page_text[index_of_token+10:index_of_token+150].split("'")[1]
url = "https://www.olx.pl/ajax/misc/contact/phone/{}".format(short_id)
data = {
'pt': phone_token
}
response = s.post(url, data=data, headers=headers)
print(response.text)
scrape_phone(540006276)