2

I'm scraping a site using scrapy and splash, it all works well but collects less data than in the target site even after making wait command of ten seconds, I have come to a conclusion this is due to some java-script not being fully loaded when the spider collects the response. It would be great is the spider could wait until all java-script loads since time may vary on data generated by the site

The recent trial was using a wait of ten seconds.

class TargetExSpider(scrapy.Spider):
    name = "tnmo_btex"
    start_urls = [
        'https://www.targetsite.com'
    ]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url=url, callback=self.parse, args={'wait': 10})

    def parse(self, response):
        rows = response.xpath(".//tr[@class='ng-skyscope']")
        ...

I would love it having splash to wait all Java-Script to load before collecting response

Thanks

Muhika Thomas
  • 128
  • 1
  • 7

0 Answers0