4

i want to combine apscheduler with scrapy.but my code is wrong. How should i modify it?

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

@defer.inlineCallbacks
def crawl():
    reactor.run()
    yield runner.crawl(Jobaispider)#this is my spider
    yield runner.crawl(Jobpythonspider)#this is my spider
    reactor.stop()

sched = BlockingScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()

Error:builtins.ValueError: signal only works in main thread

馮推宇
  • 121
  • 1
  • 1
  • 3

1 Answers1

0

This question has been answered in good detail here: How to integrate Flask & Scrapy? where it covers a variety of usecases and ideas. I also found one of the links in that thread very useful: https://github.com/notoriousno/scrapy-flask

To answer your question more directly, try this out. It uses the solution from the above two links, in particular, it uses the crochet library.

import crochet
crochet.setup()

settings = get_project_settings()
configure_logging(settings)
runner = CrawlerRunner(settings)

# Note: Removing defer here for the example
#@defer.inlineCallbacks

@crochet.run_in_reactor
def crawl():
    runner.crawl(Jobaispider)#this is my spider
    runner.crawl(Jobpythonspider)#this is my spider

sched = BlockingScheduler()
sched.add_job(crawl, 'date', run_date=datetime(2018, 12, 4, 10, 45, 10))
sched.start()
Raghuveer
  • 1,737
  • 20
  • 27