I am using selenium
and Python to do a big project. I have to go through 320.000 webpages (320K) one by one and scrape details and then sleep for a second and move on.
Like bellow:
links = ["https://www.thissite.com/page=1","https://www.thissite.com/page=2", "https://www.thissite.com/page=3"]
for link in links:
browser.get(link )
scrapedinfo = browser.find_elements_by_xpath("*//div/productprice").text
open("file.csv","a+").write(scrapedinfo)
time.sleep(1)
The greatest problem : too slow!
With this script I will take days or maybe weeks.
- Is there a way to increase speed? Such as, by visiting multiple links at the same time and scraping all at once?
I have spent hours finding answers on google and Stackoverflow and only found about multiprocessing
.
But, I am unable to apply it in my script.