Use a specific Scrapy downloader middleware per request

Question

I use Crawlera as a IP rotating service to crawl a specific website which is banning my IP quickly but I have this problem only with one website out of a dozen.

As it is possible to register multiple middlewares for a Scrapy project, I wanted to know if it was possible to define the downloader middleware to use PER REQUEST.

So I could use my Crawlera's quota only for the problematic website and not for all my requests.

the answer is yes you can do this .. for more clear answer you needs to describe your problem more clearly .. are you crawling multiple domains in one spider ? or crawlera is enabled at project level and for every domain you have different spider ? — akhter wahab, May 28 '20 at 09:36
If you are using `scrapy-crawlera`, you can use `dont_proxy` on requests that do not need Crawlera: https://scrapy-crawlera.readthedocs.io/en/v1.6.0/#how-to-use-it — Gallaecio, Jun 01 '20 at 10:43

score 3 · Answer 1 · answered May 28 '20 at 17:03

One of possible solution - usage custom_settings spider attribute (and removing CrawleraMiddleware from project settings
(assuming that you have 1 spider per 1 website and CrawleraMiddleware enabled in project settings):

class ProblemSpider(scrapy.spider):

    custom_settings = {
        'DOWNLOADER_MIDDLEWARES' : {'scrapy_crawlera.CrawleraMiddleware': 610},
        'CRAWLERA_ENABLED' : True,
        'CRAWLERA_APIKEY' : '<API key>'}

    def parse(self, response):
....

In this case CrawleraMiddleware will be used only in spiders where it defined in their custom_settings attribute.

Use a specific Scrapy downloader middleware per request

1 Answers1