3

@Sjaak Trekhaak has a 'hack' here How do I stop all spiders and the engine immediately after a condition in a pipeline is met? that can potentially stop the spiders by setting a flag in pipeline, and then call CloseSpider in the parser method. However I have the following code in pipeline (where pdate and lastseen are well defined datetime):

class StopSpiderPipeline(object):
    def process_item(self, item, spider):                                       
        if pdate < lastseen:
            spider.close_down = True 

and in spider

def parse_item(self, response):                                             
    if self.close_down:                                                     
        raise CloseSpider(reason='Already scraped')     

I got error exceptions.AttributeError: 'SyncSpider' object has no attribute 'close_down', where did I get wrong? the question was actually asked by @anicake but was not responded. Thanks,

Community
  • 1
  • 1
eN_Joy
  • 853
  • 3
  • 11
  • 20

1 Answers1

1

Is your spider's close_down attribute create? Because it looks like it doesn't.

Try changing your check to if "close_down" in self.__dict__: or adding self.close_down = False in your spider's __init__() method.

Xaqq
  • 4,308
  • 2
  • 25
  • 38
  • thanks, turns out i have not initialized the `close_down` attribute properly: i did in the `parse_item` method, it needs to be in the class. – eN_Joy Jul 16 '13 at 00:33