Im trying to scrape a review website (similar to Trustpilot). Firstly, i got a list of ~50k links of urls (complains) to scrape. Then, im scraping specific data from each url/complain.
Problem is, my forloop is getting increasingly slower. It began scraping an url every 3 seconds, but now its rate is at 20s/iteration.
Could someone review my code and point out potential flaws?
Tks
for url in tqdm(urls):
driver.get(url)
count +=1
try:
df_load = pd.DataFrame({'id' : [counta],
'caption' : [driver.find_element_by_xpath(
'//*[@id="complain-detail"]/div/div[1]/div[2]/div/div[1]/div[2]/div[1]/h1').text],
'details': [driver.find_element_by_xpath(
'//*[@id="complain-detail"]/div/div[1]/div[2]/div/div[1]/div[2]/div[1]/ul[1]').text],
'status' : [driver.find_element_by_xpath(
'//*[@id="complain-detail"]/div/div[1]/div[2]/div/div[1]/div[2]/div[3]/span[2]/strong').text],
'complaint' : [driver.find_element_by_xpath(
'//*[@id="complain-detail"]/div/div[1]/div[2]/div/div[2]/p').text]})
df = pd.concat([df_load, df])
except:
print(f'ID {counta} did not work')
pass