I'm looking to use different settings per run on a spider. Varients of this have been asked before, but the answers don't cover this case and/or are for old versions of scrapy.
- How to set different scrapy-settings for different spiders?
- How to setup and launch a Scrapy spider programmatically (urls and settings)
- Creating a generic scrapy spider
The scrapy documentation makes it clear settings are loaded in an order of precidence, with per-spider settings (custom_settings
)taking precidence over per-project settings (settings.py
).
I would like my generic global settings in settings.py, and then to override them using custom_settings
. The problem is that custom_settings cannot be declared in the __init__
method, it must be declared in the class itself.
I would like to be able to do something like this:
import scrapy
class MySpider(Spider):
...
// custom_settings needs to be set and populated here.
// custom_settings = {}
def __init__(self, spidername=None, **kwargs):
...
// But I need to populate the value of custom-settings here.
// So that I know what the spidername is
// But the problem is, scrapy ignores this.
custom_settings = open('//filepath/' + spidername).read()
...
def parse(self, response):
// Use value of MY_SETTING (a custom setting)
print(scrapy.settings.getint('MY_SETTING'))
// Use value of DEPTH_LIMIT (a built-in setting)
print(scrapy.settings.getint('DEPTH_LIMIT'))
...
The spidername
argument is sent to the spider as a command-line argument and is used to indicate which configuration file to load.
So how do I change the settings on a per-spider basis based on a command-line input argument?
Note that I don't want to use the process.crawl
method highlighted in some other answers as I want to use scrapyd in the future and I don't believe they're compatible.