0

On occasion I'll have a response with unexpected html and all of the item fields will not be extracted, however, if I retry the request, it will typically return the expected html.

As a quick fix I'm catching the error in the spiders parse method:

# project/spiders/sample_spider.py

class SampleSpider(Spider):

    [...]

    def parse(self, response):
        try:
            item = SampleItem()

            item['sample_1'] = response.xpath('sample').extract()
            item['product_count_2'] = response.xpath('sample').extract()[0]

            yield item
        except IndexError:
            logger.debug('Retrying %(url)s', {'url': response.url})
            yield Request(response.url, self.parse, dont_filter=True)

I came across this post which appears to be a similar scenario, but it seems this type of error should be handled in a Item Pipeline... Any thoughts on the best way to implement this fix?

Community
  • 1
  • 1
astro not
  • 105
  • 1
  • 8

0 Answers0