Ending scrapy runspider before full execution is done

Question

class PythonEventsSpider(scrapy.Spider):
    name = 'goodspider'
    start_urls=['https://www.amazon.com/s?me=A33IZBYF4IBZTP&marketplaceID=ATVPDKIKX0DER']
    details=[]

    def parse(self, response):
        base_url="https://www.amazon.com"
        #code here
        next_page=base_url+response.xpath('//li[@class="a-last"]/a/@href').extract_first()
        print(next_page)
        if "page=3" not in next_page:
            yield scrapy.Request(url=next_page,callback=self.parse)
        else:
            #raise CloseSpider('bandwidth_exceeded')
            #exit("Done")

Hello,i would like to stop the program when it reaches page 3 the url will be as follows https://www.amazon.com/s?i=merchant-items&me=A33IZBYF4IBZTP&page=3&marketplaceID=ATVPDKIKX0DER&qid=1555628764&ref=sr_pg_3 I Have tried some of the answers online but it didn't work the program kept run. what i want is to add a line or a function in the elsestatement to end scrapy runspider test.py -o test.csv

The documentation points at raising `CloseSpider`. What is the exact behaviour you see when you comment your `raise CloseSpider` line back in? https://docs.scrapy.org/en/latest/topics/exceptions.html#scrapy.exceptions.CloseSpider — Adam Burke, Apr 19 '19 at 01:20
See also https://stackoverflow.com/questions/27001586/scrapy-not-responding-to-closespider-exception and https://stackoverflow.com/questions/44566184/scrapy-spider-not-terminating-with-use-of-closespider-extension — Adam Burke, Apr 19 '19 at 01:49

score 0 · Answer 1 · answered Apr 19 '19 at 07:06

0

CloseSpider will process all the pending requests too

So you must have to set CONCURRENT_REQUESTS=1

answered Apr 19 '19 at 07:06

Umair Ayub

19,358
14
72
146

this will not work because I have `CONCURRENT_REQUESTS=10` – hadesfv Apr 19 '19 at 10:48
@hadesfv then there is no way to do it, because scrapy is asynchronous – Umair Ayub Apr 20 '19 at 05:38

score 0 · Answer 2 · answered Apr 22 '19 at 13:10

If you really want your script to completely stop at that point, you can terminate your script as you would do for any other Python script: use sys.exit().

However, this means that item processing and other parts of the internal workins of Scrapy won’t have a chance to run. If this is a problem for you, there is no other way beyond Umair’s response.

Ending scrapy runspider before full execution is done

2 Answers2