I have a scrapy bot that runs from a script,My problem is:After the spyder has finished crawling,the program does not end,so basically the program runs for ever until I manually shut it down,now this spyder is part of a bigger program so i cannot afford to shut it down like that as other processes havent happened.So how do i shut it down safely.
i have already surfed stackoverflow and other forums for this and i got this and this,the first one is totally not usable,trust me,i have tried,the second one looked promising but for some reason,close spider
doesnt seem to close my spider when i get the signal spider closed
Here is the bot:
def pricebot(prod_name):
class PriceBot(scrapy.Spider):
name = 'pricebot'
query = prod_name
if query.find(' ') is not -1:
query = query.replace(' ', '-')
start_urls = ['http://www.shopping.com/'+query+'/products?CLT=SCH']
def parse(self, response):
prices_container = response.css('div:nth-child(2) > span:nth-child(1) > a:nth-child(1)')
t_cont = response.css('div:nth-child(2)>h2:nth-child(1)>a:nth-child(1)>span:nth-child(1)')
title = t_cont.xpath('@title').extract()
price = prices_container.xpath('text()').extract()
#Sanitise prices results
prices = []
for p in price:
prices.append(p.strip('\n'))
#Grouping Prices To Their Actual Products
product_info = dict(zip(title, prices))
with open('product_info.json','w') as f:
f.write(json.dumps(product_info))
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(PriceBot)
process.start()
After it is done,i need to do other things,call 3 other functions to be exact