21

For development purposes, I would like to stop all scrapy crawling activity as soon a first exception (in a spider or a pipeline) occurs.

Any advice?

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
Udi
  • 29,222
  • 9
  • 96
  • 129

3 Answers3

13

In spider, you can just throw CloseSpider exception.

def parse_page(self, response):
    if 'Bandwidth exceeded' in response.body:
        raise CloseSpider('bandwidth_exceeded')

For others (middlewares, pipeline, etc), you can manually call close_spider as akhter mentioned.

imwilsonxu
  • 2,942
  • 24
  • 25
13

Since 0.11, there is CLOSESPIDER_ERRORCOUNT:

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

If it is set to 1, the spider will be closed on the first exception.

tokarev
  • 2,575
  • 1
  • 21
  • 26
5

its purely depends on your business logic. but this will work for you

crawler.engine.close_spider(self, 'log message')

Suggested Reading

Suggested Reading

and the worst solution is

import sys

sys.exit("SHUT DOWN EVERYTHING!")
akhter wahab
  • 4,045
  • 1
  • 25
  • 47