1

I use Crawlera as a IP rotating service to crawl a specific website which is banning my IP quickly but I have this problem only with one website out of a dozen.

As it is possible to register multiple middlewares for a Scrapy project, I wanted to know if it was possible to define the downloader middleware to use PER REQUEST.

So I could use my Crawlera's quota only for the problematic website and not for all my requests.

Max atton
  • 131
  • 10
  • the answer is yes you can do this .. for more clear answer you needs to describe your problem more clearly .. are you crawling multiple domains in one spider ? or crawlera is enabled at project level and for every domain you have different spider ? – akhter wahab May 28 '20 at 09:36
  • 2
    If you are using `scrapy-crawlera`, you can use `dont_proxy` on requests that do not need Crawlera: https://scrapy-crawlera.readthedocs.io/en/v1.6.0/#how-to-use-it – Gallaecio Jun 01 '20 at 10:43
  • Thanks @Gallaecio this is the best way! – Max atton Jun 01 '20 at 12:55

1 Answers1

3

One of possible solution - usage custom_settings spider attribute (and removing CrawleraMiddleware from project settings
(assuming that you have 1 spider per 1 website and CrawleraMiddleware enabled in project settings):

class ProblemSpider(scrapy.spider):

    custom_settings = {
        'DOWNLOADER_MIDDLEWARES' : {'scrapy_crawlera.CrawleraMiddleware': 610},
        'CRAWLERA_ENABLED' : True,
        'CRAWLERA_APIKEY' : '<API key>'}

    def parse(self, response):
....

In this case CrawleraMiddleware will be used only in spiders where it defined in their custom_settings attribute.

Georgiy
  • 3,158
  • 1
  • 6
  • 18