0

I'm looking to use different settings per run on a spider. Varients of this have been asked before, but the answers don't cover this case and/or are for old versions of scrapy.

The scrapy documentation makes it clear settings are loaded in an order of precidence, with per-spider settings (custom_settings)taking precidence over per-project settings (settings.py). I would like my generic global settings in settings.py, and then to override them using custom_settings. The problem is that custom_settings cannot be declared in the __init__ method, it must be declared in the class itself.

I would like to be able to do something like this:

import scrapy

class MySpider(Spider):
    ...
    // custom_settings needs to be set and populated here.
    // custom_settings = {}

    def __init__(self, spidername=None, **kwargs):

        ...
        // But I need to populate the value of custom-settings here.
        // So that I know what the spidername is
        // But the problem is, scrapy ignores this.
        custom_settings = open('//filepath/' + spidername).read()
        ...

    def parse(self, response):
        // Use value of MY_SETTING (a custom setting)
        print(scrapy.settings.getint('MY_SETTING'))

        // Use value of DEPTH_LIMIT (a built-in setting)
        print(scrapy.settings.getint('DEPTH_LIMIT'))
        ...

The spidername argument is sent to the spider as a command-line argument and is used to indicate which configuration file to load.

So how do I change the settings on a per-spider basis based on a command-line input argument?

Note that I don't want to use the process.crawl method highlighted in some other answers as I want to use scrapyd in the future and I don't believe they're compatible.

Community
  • 1
  • 1
Pipupnipup
  • 163
  • 2
  • 8
  • Why not create new projects for each spider with their own `settings.py` file? It looks like you have a specific `settings.py` file for each spider anyways. Also having different projects will allow you to deploy to `scrapyd` more easily. As for getting settings in the `parse` method is concerned you can use `self.settings['MY_SETTING')` to fetch any setting. –  Nov 09 '16 at 18:40
  • @kiran.koduru - I was hoping to have a number of global values that could be shared between scripts. Duplicating them is inefficient as then if I needed to add something I'd have to add it to all of the `settings.py` files. – Pipupnipup Nov 09 '16 at 22:03
  • Ok. (Shameless plug) You can checkout my pip package [Arachne](http://arachne.readthedocs.io/en/latest/) which allows you to write minimal scrapy code to manage spiders. It wraps a Flask API around your Scrapy spiders. I haven't updated the package for Scrapy v1.0+ but works well for Scrapy v0.24. It eliminates the need for managing your scrapers through scrapyd. Take a look and let me know if it's something you might want to consider. –  Nov 10 '16 at 15:56

0 Answers0