20

I want to enable some http-proxy for some spiders, and disable them for other spiders.

Can I do something like this?

# settings.py
proxy_spiders = ['a1' , b2']

if spider in proxy_spider: #how to get spider name ???
    HTTP_PROXY = 'http://127.0.0.1:8123'
    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'myproject.middlewares.ProxyMiddleware': 410,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }
else:
    DOWNLOADER_MIDDLEWARES = {
         'myproject.middlewares.RandomUserAgentMiddleware': 400,
         'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
    }

If the code above doesn't work, is there any other suggestion?

JavaNoScript
  • 2,345
  • 21
  • 27
Michael Nguyen
  • 1,691
  • 2
  • 18
  • 33

5 Answers5

37

a bit late, but since release 1.0.0 there is a new feature in scrapy where you can override settings per spider like this:

class MySpider(scrapy.Spider):
    name = "my_spider"
    custom_settings = {"HTTP_PROXY":'http://127.0.0.1:8123',
                       "DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
                                                  'myproject.middlewares.ProxyMiddleware': 410,
                                                  'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}




class MySpider2(scrapy.Spider):
        name = "my_spider2"
        custom_settings = {"DOWNLOADER_MIDDLEWARES": {'myproject.middlewares.RandomUserAgentMiddleware': 400,
                                                      'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None}}
user4055746
  • 389
  • 3
  • 5
15

There is a new and easier way to do this.

class MySpider(scrapy.Spider):
    name = 'myspider'

    custom_settings = {
        'SOME_SETTING': 'some value',
    }

I use Scrapy 1.3.1

Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
8

You can add setting.overrides within the spider.py file Example that works:

from scrapy.conf import settings

settings.overrides['DOWNLOAD_TIMEOUT'] = 300 

For you, something like this should also work

from scrapy.conf import settings

settings.overrides['DOWNLOADER_MIDDLEWARES'] = {
     'myproject.middlewares.RandomUserAgentMiddleware': 400,
     'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
Ricky Sahu
  • 23,455
  • 4
  • 42
  • 32
  • 3
    Don't forget this before do that `from scrapy.conf import settings` – Aminah Nuraini Oct 20 '15 at 10:38
  • 2
    settings.overrides has been deprecated in Scrapy versions greater than 1. Using custom_settings dictionary in your spider declaration works. – v01d Jan 23 '16 at 08:33
4

You can define your own proxy middleware, something straightforward like this:

from scrapy.contrib.downloadermiddleware import HttpProxyMiddleware

class ConditionalProxyMiddleware(HttpProxyMiddleware):
    def process_request(self, request, spider):
        if getattr(spider, 'use_proxy', None):
            return super(ConditionalProxyMiddleware, self).process_request(request, spider)

Then define the attribute use_proxy = True in the spiders that you want to have the proxy enabled. Don't forget to disable the default proxy middleware and enable your modified one.

R. Max
  • 6,624
  • 1
  • 27
  • 34
-2

Why not use two projects rather than only one?

Let's name these two projects with proj1 and proj2. In proj1's settings.py, put these settings:

HTTP_PROXY = 'http://127.0.0.1:8123'
DOWNLOADER_MIDDLEWARES = {
     'myproject.middlewares.RandomUserAgentMiddleware': 400,
     'myproject.middlewares.ProxyMiddleware': 410,
     'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}

In proj2's settings.py, put these settings:

DOWNLOADER_MIDDLEWARES = {
     'myproject.middlewares.RandomUserAgentMiddleware': 400,
     'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None
}
JavaNoScript
  • 2,345
  • 21
  • 27
  • That's not what the user wants to do, there are certain cases where you want multiple spiders in same project. – rajat Sep 03 '15 at 17:50