14

I have to call the crawler from another python file, for which I use the following code.

def crawl_koovs():
    spider = SomeSpider()
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()

On running this, I get the error as

exceptions.ValueError: signal only works in main thread

The only workaround I could find is to use

reactor.run(installSignalHandlers=False)

which I don't want to use as I want to call this method multiple times and want reactor to be stopped before the next call. What can I do to make this work (maybe force the crawler to start in the same 'main' thread)?

hygull
  • 8,464
  • 2
  • 43
  • 52
Pravesh Jain
  • 4,128
  • 6
  • 28
  • 47
  • Here is a [working sample code](http://stackoverflow.com/questions/18838494/scrapy-very-basic-example/27744766#27744766) I've used to run Scrapy from script before. Hope it helps. – alecxe May 18 '15 at 09:55

2 Answers2

6

The first thing I would say to you is when you're executing Scrapy from external file the loglevel is set to INFO,you should change it to DEBUG to see what's happening if your code doesn't work

you should change the line:

 log.start()

for:

log.start(loglevel=log.DEBUG)

To store everything in the log and generate a text file (for debugging purposes) you can do:

log.start(logfile="file.log", loglevel=log.DEBUG, crawler=crawler, logstdout=False)

About the signals issue with the log level changed to DEBUG maybe you can see some output that can help you to fix it, you can try to put your script into the Scrapy Project folder to see if still crashes.

If you change the line:

crawler.signals.connect(reactor.stop, signal=signals.spider_closed)

for:

dispatcher.connect(reactor.stop, signals.spider_closed)

What does it say ?

Depending on your Scrapy version it may be deprecated

AlvaroAV
  • 10,335
  • 12
  • 60
  • 91
0

for looping and use un azure functions with timertrigger use this taks

from twisted.internet import task from twisted.internet import reactor

loopTimes = 3 failInTheEnd = False
_loopCounter = 0

def runEverySecond():
    """
    Called at ever loop interval.
    """
    global _loopCounter

    if _loopCounter < loopTimes:
        _loopCounter += 1
        print('A new second has passed.')
        return

    if failInTheEnd:
        raise Exception('Failure during loop execution.')

    # We looped enough times.
    loop.stop()
    return


def cbLoopDone(result):
    """
    Called when loop was stopped with success.
    """
    print("Loop done.")
    reactor.stop()


def ebLoopFailed(failure):
    """
    Called when loop execution failed.
    """
    print(failure.getBriefTraceback())
    reactor.stop()


loop = task.LoopingCall(runEverySecond)

# Start looping every 1 second. loopDeferred = loop.start(1.0)

# Add callbacks for stop and failure. loopDeferred.addCallback(cbLoopDone) loopDeferred.addErrback(ebLoopFailed)

reactor.run()

If we want a task to run every X seconds repeatedly, we can use twisted.internet.task.LoopingCall: from https://docs.twisted.org/en/stable/core/howto/time.html