Based on the suggestions in this SO post Run a Scrapy spider in a Celery Task I have developed my spiders. The first time I run it in a new python kernel, it works well. The next time I run it, it seems it is running twice with the following error. 3rd time it is running 3 times with the same error, and after that it is running 3 times.
I have a tough time figuring out what exactly is happening.
I'm not exactly sure what is causing the error - Scrapy or billiard or twister. Only one related suggestion was this SO Why I am Getting KeyError in Scrapy? and that too didn't solve the issue.
Any suggestion is greatly appreciated.
Spider.py:
import scrapy
class EmptySpider(scrapy.Spider):
name = 'empty'
def start_requests(self):
yield scrapy.Request("http://en.m.wikipedia.org/")
def parse(self, response):
print("")
print("")
print("")
print("ran empty spider once")
Crawl.py
import scrapy
from scrapy.crawler import CrawlerProcess
from spider import EmptySpider
from billiard import Process
def run_spider_empty(log = False):
crawler = CrawlerProcess(settings={
'LOG_ENABLED': log,
})
crawler.crawl(EmptySpider)
process = Process(target=crawler.start, stop_after_crawl=False)
process.start()
Output from Terminal
$ run_spider_empty()
$
ran empty once
$ run_spider_empty()
$
ran empty once
ran empty once
Unhandled Error
Traceback (most recent call last):
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 503, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 339, in addCallback
return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kw)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 330, in addCallbacks
self._runCallbacks()
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 515, in _continueFiring
callable(*args, **kwargs)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 763, in disconnectAll
selectables = self.removeAll()
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in removeAll
[self._selectables[fd] for fd in self._reads],
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in <listcomp>
[self._selectables[fd] for fd in self._reads],
builtins.KeyError: 9
$ run_spider_empty()
$
ran empty once
ran empty once
ran empty once
Unhandled Error
Traceback (most recent call last):
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 503, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 339, in addCallback
return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kw)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 330, in addCallbacks
self._runCallbacks()
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
current.result = callback(current.result, *args, **kw)
--- <exception caught here> ---
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 515, in _continueFiring
callable(*args, **kwargs)
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 763, in disconnectAll
selectables = self.removeAll()
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in removeAll
[self._selectables[fd] for fd in self._reads],
File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in <listcomp>
[self._selectables[fd] for fd in self._reads],
builtins.KeyError: 9
$