I am trying to scrape a singular data point from a list of urls to dynamically loaded sites. I have implemented a scraper with selenium, but it is too slow. I tried using scrapy but realized scrapy does not work with dynamically loaded sites. I have seen documentation on splash with scrapy - but this seems to be the case where splash loads one dynamic site and scrapy parses the data from the one site; I have a huge list of urls. I am considering using mutliprocessing but unsure where to get started/if it would work well with selenium.
def get_cost(url):
driver.get(url)
try:
element = WebDriverWait(driver, 4).until(
EC.presence_of_element_located((By.XPATH,'/html/body/c-wiz[2]/div/div[2]/c-wiz/div/c-wiz/c-wiz/div[2]/div[2]/ul[1]/li[1]/div/div[2]/div/div[9]/div[2]/span'))
)
cost = element.get_attribute('textContent')
except:
cost = "-"
finally:
driver.quit()
return cost
This is a function that given a url, grabs the cheapest flight cost on the site. I am very new to web scraping so I would appreciate some advice with the best way to move forward.