I'm scraping a site using scrapy and splash, it all works well but collects less data than in the target site even after making wait command of ten seconds, I have come to a conclusion this is due to some java-script not being fully loaded when the spider collects the response. It would be great is the spider could wait until all java-script loads since time may vary on data generated by the site
The recent trial was using a wait of ten seconds.
class TargetExSpider(scrapy.Spider):
name = "tnmo_btex"
start_urls = [
'https://www.targetsite.com'
]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url, callback=self.parse, args={'wait': 10})
def parse(self, response):
rows = response.xpath(".//tr[@class='ng-skyscope']")
...
I would love it having splash to wait all Java-Script to load before collecting response
Thanks