How can I make scrapy crawl break and exit when encountering the first exception?

Question

For development purposes, I would like to stop all scrapy crawling activity as soon a first exception (in a spider or a pipeline) occurs.

Any advice?

score 13 · Answer 1 · answered Apr 23 '13 at 03:30

13

In spider, you can just throw CloseSpider exception.

def parse_page(self, response):
    if 'Bandwidth exceeded' in response.body:
        raise CloseSpider('bandwidth_exceeded')

For others (middlewares, pipeline, etc), you can manually call close_spider as akhter mentioned.

answered Apr 23 '13 at 03:30

imwilsonxu

2,942
24
25

1

What is the ideal place to catch spider's exceptions in scrapy ? Thanks – Raheel Jul 17 '17 at 14:34

score 13 · Answer 2 · answered Mar 08 '16 at 15:14

13

Since 0.11, there is CLOSESPIDER_ERRORCOUNT:

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

If it is set to 1, the spider will be closed on the first exception.

answered Mar 08 '16 at 15:14

tokarev

2,575
1
21
26

Thanks! I had the same problem and this worked for me. – Mike Ottum Jun 28 '16 at 17:48

akhter wahab · Answer 3 · 2012-03-02T07:35:25.290

5

its purely depends on your business logic. but this will work for you

crawler.engine.close_spider(self, 'log message')

How can I make scrapy crawl break and exit when encountering the first exception?

3 Answers3

Linked