I am using scrapy to crawl data. The target website blocks the IP after it sends about 1000 requests.
To deal with this, I wrote a proxy middleware, and because the amount of data is relatively large, I also wrote a cache extension. When I enabled both of them, I get banned more often. It works well when only the proxy middleware is enabled.
I know that when scrapy engine start, extensions start earlier than middlewares. Could this be the reason? If not, what else should I consider?
Any suggestions will be appreciated!