scrapy: Defining crawler settings

Question

I'm trying to override some settings for a crawler being called in a script, but these settings seems not to take effect:

from scrapy import log
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from someproject.spiders import SomeSpider

spider = SomeSpider()
overrides = {
    'LOG_ENABLED': True,
    'LOG_STDOUT': True,
}
settings = get_project_settings()
settings.overrides.update(overrides)
log.start()
crawler = CrawlerProcess(settings)
crawler.install()
crawler.configure()
crawler.crawl(spider)
crawler.start()

And in the spider:

from scrapy.spider import BaseSpider

class SomeSpider(BaseSpider):

    def __init__(self):
        self.start_urls = [ 'http://somedomain.com' ]

    def parse(self, response):
        print 'some test' # won't print anything
        exit(0) # will normally exit failing the crawler

By defining LOG_ENABLED and LOG_STDOUT, I expect to see the "some test" string being printed in the log. Also, I can't seem to redirect the log to a LOG_FILE among some other settings I've tried.

I must be doing something wrong... Please help. =D

score 0 · Answer 1 · answered Nov 08 '13 at 02:57

0

use log.msg('some test') to print the log

answered Nov 08 '13 at 02:57

whale_steward

2,088
2
25
37

score 0 · Answer 2 · edited May 23 '17 at 10:28

0

You may need to start Twisted's reactor after starting the crawler:

from twisted.internet import reactor
#...other imports...

#...setup crawler...
crawler.start()
reactor.run()

Related question/more code: Scrapy crawl from script always blocks script execution after scraping

edited May 23 '17 at 10:28

Community

1
1

answered Mar 19 '14 at 01:06

PlasmaSauna

235
1
5

scrapy: Defining crawler settings

2 Answers2