how to force scrapy exit when there is an exception

Question

I wrote a crawler with Scrapy.

There is a function in the pipeline where I write my data to a database. I use the logging module to log runtime logs.

I found that when my string have Chinese logging.error() will throw an exception. But the crawler keeps running!

I know this is a minor error but if there is a critical exception I will miss it if crawler keeps running.

My question is: Is there a setting that I can force Scrapy stop when there is an exception?

parik · Accepted Answer · 2017-06-08T09:41:12.617

10

You can use CLOSESPIDER_ERRORCOUNT

An integer which specifies the maximum number of errors to receive before closing the spider. If the spider generates more than that number of errors, it will be closed with the reason closespider_errorcount. If zero (or non set), spiders won’t be closed by number of errors.

By default it is set to 0 CLOSESPIDER_ERRORCOUNT = 0 you can change it to 1 if you want to exit when you have the first error.

UPDATE

Read the answers of this question, you can also use:

crawler.engine.close_spider(self, 'log message')

for more information read :

Close spider extension

edited Jun 08 '17 at 09:41

answered Jun 08 '17 at 09:31

parik

2,313
12
39
67

1

I missed that one! Good option. – paul trmbrth Jun 08 '17 at 09:39
Hi parik, I think your answers is what I want. I add following code in my spider, but it does not work, can you help me on this?: EXTENSIONS = { # 'scrapy.extensions.telnet.TelnetConsole': None, 'scrapy.extensions.closespider.CloseSpider': 100, } CLOSESPIDER_ERRORCOUNT = 1 – scott huang Jun 09 '17 at 07:55
@scotthuang update your question and put what you tried and the error messages please – parik Jun 09 '17 at 08:13
1

Hi parik, I fund that it actually works!. I tested it with a database exception. When I test this extension with Out Of Index exception, Scrapy stops. I will dig deeper to find out why, thanks for you advice. – scott huang Jun 09 '17 at 08:56

score 3 · Answer 2 · answered Jun 08 '17 at 09:30

3

In the process_item function of your spider you have an instance of spider.

To solve your problem you could catch the exceptions when you insert your data, then neatly stop you spider if you catch a certain exeption like this:

 def process_item(self, item, spider):
    try:
        #Insert your item here
    except YourExceptionName:
        spider.crawler.engine.close_spider(self, reason='finished')

answered Jun 08 '17 at 09:30

Adrien Blanquer

2,041
1
19
31

This look promising, but how do you use the `YourExceptionName` in the `close_spider()`? – not2qubit Sep 25 '18 at 06:58
I don't get your question. In my exemple `YourExceptionName` stands for a custom exception, but it can be any or none. `close_spider()` is a function of the scrapy engine. – Adrien Blanquer Sep 25 '18 at 08:02
Yeah, I'm new to this, so I guess the question is how would a custom exception look, where is it defined? – not2qubit Sep 25 '18 at 08:48
1

You can check this [answer](https://stackoverflow.com/a/1319675/7820534) for creating custom exceptions. – Adrien Blanquer Sep 25 '18 at 08:52
1

Thanks Adrian, that link clarified it. – not2qubit Sep 25 '18 at 09:00

score 1 · Answer 3 · answered Jun 08 '17 at 09:28

I don't know of a setting that would close the crawler on any exception, but you have at least a couple of options:

you can raise CloseSpider exception in a spider callback, maybe when you catch that exception you mention
you can call crawler.engine.close_spider(spider, 'some reason') if you have a reference to the crawler and spider object, for example in an extension. See how the CloseSpider extension is implemented (it's not the same as the CloseSpider exception). You could hook this with the spider_error signal for example.

Can you provide an example of the first option? – not2qubit Sep 25 '18 at 06:55 — not2qubit, Sep 25 '18 at 06:55

how to force scrapy exit when there is an exception

3 Answers3