I would like to if it is possible to crawl https pages using scrapy + crawlera. So far I was using Python requests with the following settings:
proxy_host = 'proxy.crawlera.com'
proxy_port = '8010'
proxy_auth = 'MY_KEY'
proxies = {
"https": "https://{}@{}:{}/".format(proxy_auth, proxy_host,
proxy_port),
"http": "http://{}@{}:{}/".format(proxy_auth, proxy_host, proxy_port)
}
ca_cert = 'crawlera-ca.crt'
res = requests.get(url='https://www.google.com/',
proxies=proxies,
verify=ca_cert
)
I want to move into async execution via Scrapy. I know there is scrapy-crawlera plugin, but I do not know how to configure it when I have the certificate. Also, one thing bothers me. Crawlera comes with different pricing plans. The basic one is C10 which allows for 10 concurrent requests. What does it mean? Do I need to set CONCURRENT_REQUESTS=10
in settings.py?