I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request another automatically.
Asked
Active
Viewed 9,699 times
8
-
you can write a middleware. – kev Jun 04 '13 at 06:36
2 Answers
18
Use "errback" in the Request like
errback=self.error_handler
where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.
see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html

Omair Shamshir
- 2,126
- 13
- 23
10
Just set handle_httpstatus_list = [404, 500]
and check for the status code in parse
method. Here's an example:
from scrapy.http import Request
from scrapy.spider import BaseSpider
class MySpider(BaseSpider):
handle_httpstatus_list = [404, 500]
name = "my_crawler"
start_urls = ["http://github.com/illegal_username"]
def parse(self, response):
if response.status in self.handle_httpstatus_list:
return Request(url="https://github.com/kennethreitz/", callback=self.after_404)
def after_404(self, response):
print response.url
# parse the page and extract items
Also see:
- How to get the scrapy failure URLs?
- Scrapy and response status code: how to check against it?
- How to retry for 404 link not found in scrapy?
Hope that helps.
-
1this doesnt cover total failures , e.g DNS - only when a webserver responds – HaveAGuess Jun 17 '14 at 01:05