1

I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings:

proxy_host = 'proxy.crawlera.com'
proxy_port = '8010'
proxy_auth = 'MY_KEY'
proxies    = {
    "https": "https://{}@{}:{}/".format(proxy_auth, proxy_host, 
proxy_port),
    "http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)
}
ca_cert    = 'crawlera-ca.crt'

res = requests.get(url='https://www.google.com/',
    proxies=proxies,
    verify=ca_cert
)

I want to move into async execution via Scrapy. I know there is scrapy-crawlera plugin, but I do not know how to configure it when I have the certificate. Also, one thing bothers me. Crawlera comes with different pricing plans. The basic one is C10 which allows for 10 concurrent requests. What does it mean? Do I need to set CONCURRENT_REQUESTS=10 in settings.py?

Bociek
  • 1,195
  • 2
  • 13
  • 28
  • There is a [Configuration](https://scrapy-crawlera.readthedocs.io/en/latest/#configuration) section in the [scrapy-crawlera documentation](https://scrapy-crawlera.readthedocs.io/en/latest/) which should solve most of your doubts. – Gallaecio Jan 11 '19 at 15:34

0 Answers0