1

I'm getting an the following error after retry when trying to crawl a website.

[<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'ssl3_check_cert_and_algorithm', 'dh key too small')]>]

I tried with every SSL method available in Scrapy with similar result. When I go to the site in Chrome it seems the page is insecure (broken HTTPS), but I still can bypass the error. Same behaviour using python requests (I can get the site content by setting verify to False).

Is there any workaround? Can't I just turn off SSL validations the way I do in python requests?

P.S. Share the site URL makes no sense since it only allows requests from whitelisted IPs.

Facundo Fabre
  • 228
  • 3
  • 8
  • Possible duplicate of [OpenSSL DH Key Too Small Error](http://stackoverflow.com/questions/36417224/openssl-dh-key-too-small-error) – ivan_pozdeev Jun 28 '16 at 01:51
  • No, I know the problem is at the server side and the possible solutions (like disabling validation). I'm asking if there is a workaround for scrapy specifically. – Facundo Fabre Jun 28 '16 at 01:59
  • It also lists relevant OpenSSL settings. All that leaves is find their Python counterparts and the way (if any) to set them from Scrapy. – ivan_pozdeev Jun 28 '16 at 05:39
  • What version of scrapy are you using? The output of `scrapy version -v` provides valuable information. Even if you cannot share the website, if you could find another website showing the same behavior would be useful. I tried a couple from [SSL Labs Recent Worst](https://www.ssllabs.com/ssltest/) with "This server supports weak Diffie-Hellman (DH) key exchange parameters." and was able to connect, so it's hard to help you without a test website. – paul trmbrth Jun 28 '16 at 08:41
  • Scrapy : 1.1.0 lxml : 3.6.0.0 libxml2 : 2.9.2 Twisted : 16.2.0 Python : 2.7.10 (default, Oct 23 2015, 19:19:21) - [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] pyOpenSSL : 16.0.0 (OpenSSL 1.0.2h 3 May 2016) Platform : Darwin-15.4.0-x86_64-i386-64bit I still can't find another site with the same behaviour. – Facundo Fabre Jun 28 '16 at 13:36

1 Answers1

-1

Disabling validation will not help since this is not a problem of the certificate validation. What could help is a change of the cipher used, i.e. disable DH ciphers so that the code affected by weak DH keys (logjam attack) gets not used. What would also help is use of an older version of OpenSSL which does not yet protect against the logjam attack.

Unfortunately there seems to be no obvious way to specify the cipher set to use in Scrapy. Maybe one could find one if hooking into twisted or OpenSSL libraries.

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172