10

We have a pretty much standard Scrapy project (Scrapy 0.24).

I'd like to catch specific HTTP response codes, such as 200, 500, 502, 503, 504 etc.

Something like that:

class Spider(...):

    def parse(...):
        processes HTTP 200

    def parse_500(...):
        processes HTTP 500 errors

    def parse_502(...):
        processes HTTP 502 errors

    ...

How can we do that?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Spaceman
  • 1,185
  • 4
  • 17
  • 31

1 Answers1

13

By default, Scrapy only handles responses with status codes 200-300.

Let Scrapy handle 500 and 502:

class Spider(...):
    handle_httpstatus_list = [500, 502]

Then, in the parse() callback, check response.status:

def parse(response):
    if response.status == 500:
        # logic here
    elif response.status == 502:
        # logic here
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 2
    assume there are several parse function in the spider. adding status condition in each parse function seems less graceful. Is there a better way to do it? – LeonF Aug 14 '18 at 20:25
  • 1
    [HttpErrorMiddleware](https://docs.scrapy.org/en/latest/topics/spider-middleware.html#module-scrapy.spidermiddlewares.httperror) – Mykola Vasilaki Sep 17 '19 at 02:39