0

Based on the suggestions in this SO post Run a Scrapy spider in a Celery Task I have developed my spiders. The first time I run it in a new python kernel, it works well. The next time I run it, it seems it is running twice with the following error. 3rd time it is running 3 times with the same error, and after that it is running 3 times.

I have a tough time figuring out what exactly is happening.

I'm not exactly sure what is causing the error - Scrapy or billiard or twister. Only one related suggestion was this SO Why I am Getting KeyError in Scrapy? and that too didn't solve the issue.

Any suggestion is greatly appreciated.

Spider.py:

import scrapy

class EmptySpider(scrapy.Spider):
   name = 'empty'
    
   def start_requests(self):
        yield scrapy.Request("http://en.m.wikipedia.org/")

   def parse(self, response):
      print("")
      print("")
      print("")
      print("ran empty spider once")

Crawl.py

import scrapy
from scrapy.crawler import CrawlerProcess
from spider import EmptySpider
from billiard import Process

def run_spider_empty(log = False):
    crawler = CrawlerProcess(settings={
        'LOG_ENABLED': log,
    })
    crawler.crawl(EmptySpider)
    process = Process(target=crawler.start, stop_after_crawl=False)
    process.start()

Output from Terminal

$ run_spider_empty()

$ 


ran empty once

$ run_spider_empty()

$ 


ran empty once



ran empty once

Unhandled Error

Traceback (most recent call last):

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 503, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 339, in addCallback
    return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kw)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 330, in addCallbacks
    self._runCallbacks()

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
    current.result = callback(current.result, *args, **kw)

--- <exception caught here> ---

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 515, in _continueFiring
    callable(*args, **kwargs)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 763, in disconnectAll
    selectables = self.removeAll()

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in removeAll
    [self._selectables[fd] for fd in self._reads],

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in <listcomp>
    [self._selectables[fd] for fd in self._reads],

builtins.KeyError: 9

$ run_spider_empty()

$ 


ran empty once



ran empty once



ran empty once

Unhandled Error

Traceback (most recent call last):
  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 503, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 339, in addCallback
    return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kw)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 330, in addCallbacks
    self._runCallbacks()

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
    current.result = callback(current.result, *args, **kw)

--- <exception caught here> ---

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 515, in _continueFiring
    callable(*args, **kwargs)

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/base.py", line 763, in disconnectAll
    selectables = self.removeAll()

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in removeAll
    [self._selectables[fd] for fd in self._reads],

  File "/home/xxxxxx/.local/share/virtualenvs/yyyyy-3i5Xwd2p/lib/python3.6/site-packages/twisted/internet/epollreactor.py", line 199, in <listcomp>
    [self._selectables[fd] for fd in self._reads],

builtins.KeyError: 9

$
Parzival
  • 2,051
  • 4
  • 14
  • 32
  • maybe previous spider is still run ? For test you should run it in normal way, not in terminal - ie. `python script.py` or `scrapy runspider ...` – furas Apr 28 '21 at 09:59
  • Thx for the suggestion! If I enable log, the spider says its closed but I guess some thing is not closed and it is still queued and getting executed next time. I even tired `process.stop()` but it threw an error. Running in CLI will run only once and it masked the error. Only on 2nd run I'm getting issue. My test ran fine, only after it was not performing properly in the script, I found this. – PhoneRoutine Apr 29 '21 at 02:42

0 Answers0