1

I have a scrapy bot that runs from a script,My problem is:After the spyder has finished crawling,the program does not end,so basically the program runs for ever until I manually shut it down,now this spyder is part of a bigger program so i cannot afford to shut it down like that as other processes havent happened.So how do i shut it down safely. i have already surfed stackoverflow and other forums for this and i got this and this,the first one is totally not usable,trust me,i have tried,the second one looked promising but for some reason,close spider doesnt seem to close my spider when i get the signal spider closed

Here is the bot:

def pricebot(prod_name):
  class PriceBot(scrapy.Spider):
    name = 'pricebot'
    query = prod_name
    if query.find(' ') is not -1:
        query = query.replace(' ', '-')
    start_urls = ['http://www.shopping.com/'+query+'/products?CLT=SCH']

    def parse(self, response):

        prices_container = response.css('div:nth-child(2) > span:nth-child(1) > a:nth-child(1)')
        t_cont = response.css('div:nth-child(2)>h2:nth-child(1)>a:nth-child(1)>span:nth-child(1)')

        title = t_cont.xpath('@title').extract()
        price = prices_container.xpath('text()').extract()
        #Sanitise prices results
        prices = []
        for p in price:
            prices.append(p.strip('\n'))
        #Grouping Prices To Their Actual Products
        product_info = dict(zip(title, prices))
        with open('product_info.json','w') as f:
            f.write(json.dumps(product_info))
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(PriceBot)
process.start() 

After it is done,i need to do other things,call 3 other functions to be exact

silverhash
  • 880
  • 8
  • 21
  • Show us your code – Umair Ayub Oct 31 '17 at 07:17
  • Why are you declaring that spider inside a function? Move that elsewhere and pass it an argument instead... Regarding the program not finishing I imagine if you're managing the spiders then you want to keep a reference to the main reactor and shut that down. You'll need to show the other relevant code of how you're managing spiders here... – Jon Clements Oct 31 '17 at 07:46
  • Its in a function because it doesnt just happen,it should only happen when it is required,specifically when a user clicks a button.Passing it an argument seems great,could you elaborate on how i could do that?? – silverhash Oct 31 '17 at 07:51
  • That is all the code i have for the spider mate – silverhash Oct 31 '17 at 07:52

0 Answers0