2

I am trying to scrape this site . I managed to do it by using urllib and beautifulsoup. But urllib is too slow. I want to have asynchronous requests because the urls are thousands. I found that a nice package is grequests.

example:

import grequests
from bs4 import BeautifulSoup

pages = []
page="https://www.spitogatos.gr/search/results/residential/sale/r100/m100m101m102m103m104m105m106m107m108m109m110m150m151m152m153m154m155m156m157m158m159m160m161m162m163m164m165m166m167m168m169m170m171m172m173m174m175m176m177m178m179m180m181m182m183m184m185m186m187m188m189m190m191m192m193m194m195m196m197m198m106001m125000m"
for i in range(1,1000):
    pages.append(page)
    page="https://www.spitogatos.gr/search/results/residential/sale/r100/m100m101m102m103m104m105m106m107m108m109m110m150m151m152m153m154m155m156m157m158m159m160m161m162m163m164m165m166m167m168m169m170m171m172m173m174m175m176m177m178m179m180m181m182m183m184m185m186m187m188m189m190m191m192m193m194m195m196m197m198m106001m125000m"
    page = page + "/offset_{}".format(i*10)

rs = (grequests.get(item) for item in pages)
a=grequests.map(rs)

The problem is that I don't know how to continue and use beautifulsoup. So as to get the html code of every page. It would be nice to hear your ideas. Thank you!

dimosbele
  • 381
  • 3
  • 19
  • I suggest you try a [Scrapy](https://scrapy.org/). This framework built on top of the Twisted asynchronous networking library and faster than `bs4` and `urllib`. – vold May 02 '17 at 11:46

1 Answers1

0

Refer to the script below, also check the link of the source. It will help.

reqs = (grequests.get(link) for link in links)
resp=grequests.imap(reqs, grequests.Pool(10))
 
for r in resp:
   soup = BeautifulSoup(r.text, 'lxml')
   results = soup.find_all('a', attrs={"class":'product__list-name'})
   print(results[0].text)
   prices = soup.find_all('span', attrs={'class':"pdpPriceMrp"})
   print(prices[0].text)
   discount = soup.find_all("div", attrs={"class":"listingDiscnt"})
   print(discount[0].text)

Source: https://blog.datahut.co/asynchronous-web-scraping-using-python/

Neil
  • 23
  • 6