Scrapy Spider retry request

Asked Feb 29 '16 at 22:35

Active Feb 29 '16 at 22:35

Viewed 761 times

On occasion I'll have a response with unexpected html and all of the item fields will not be extracted, however, if I retry the request, it will typically return the expected html.

As a quick fix I'm catching the error in the spiders parse method:

# project/spiders/sample_spider.py

class SampleSpider(Spider):

    [...]

    def parse(self, response):
        try:
            item = SampleItem()

            item['sample_1'] = response.xpath('sample').extract()
            item['product_count_2'] = response.xpath('sample').extract()[0]

            yield item
        except IndexError:
            logger.debug('Retrying %(url)s', {'url': response.url})
            yield Request(response.url, self.parse, dont_filter=True)

I came across this post which appears to be a similar scenario, but it seems this type of error should be handled in a Item Pipeline... Any thoughts on the best way to implement this fix?

edited May 23 '17 at 11:50

Community

asked Feb 29 '16 at 22:35

astro not

Why does it fail, network error or what exactly? – Padraic Cunningham Feb 29 '16 at 23:16
@PadraicCunningham, it's returns with a 200 response, but it just doesn't return the expected html for some reason. – astro not Mar 01 '16 at 20:42
What html is returned? – Padraic Cunningham Mar 01 '16 at 20:46
The urls that are being requested are search queries and it returns a page that states my search did not return any results. – astro not Mar 01 '16 at 20:50
The query is of coarse valid, as retrying the request returns the expected html. – astro not Mar 01 '16 at 20:52
That is strange that it succeeds intermittently, you could do as you suggest and create some pipeline that would retry but really getting to the underlying reason would be a better option,can you share the site? – Padraic Cunningham Mar 01 '16 at 20:55
Amazon, typically this is the offending http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dfashion&field-keywords=%22+%22 – astro not Mar 01 '16 at 20:59
what are you trying to scrape? – Padraic Cunningham Mar 01 '16 at 21:04
"1-48 of 32,964,255 results for Clothing, Shoes & Jewelry" – astro not Mar 01 '16 at 21:07
Are you actually using that url? – Padraic Cunningham Mar 01 '16 at 21:14
Yes, that's typically the url that needs to retried. – astro not Mar 01 '16 at 21:30
The url you had posted did not actually work – Padraic Cunningham Mar 01 '16 at 21:43
Hmm, try this http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dfashion&field-keywords=%22+%22 – astro not Mar 01 '16 at 22:22

Scrapy Spider retry request

0 Answers0