2

I'm using Mechanize to scrape a site and am getting errors related to a hostname mismatch. I've discovered that the root of the issue is SNI being used on the site I'm scraping and I'd like to specify the hostname to ensure the correct certificate is being used.

Here's my current setup:

agent = Mechanize.new
agent.user_agent = custom_user_agent
agent.verify_mode = OpenSSL::SSL::VERIFY_PEER

page = agent.get "website.com"

And here's what I think I need to add (or something along these lines) to ensure the correct cert is used:

OpenSSL::SSL::SSLSocket.hostname = "website.com"

Is this possible to do in Mechanize, or do I need to figure out how to manually specify the cert to use?

For context, I'm aware of the VERIFY_NONE solution but would prefer to avoid it given the vulnerabilities it introduces.

Luke Keller
  • 2,488
  • 3
  • 20
  • 23
  • 1
    OpenSSL 1.0.2 and below does ***not*** perform hostname matching. Applications, like cURL and Mechanize, must perform the matching. [OpenSSL 1.1.0 is scheduled to implement it](http://wiki.openssl.org/index.php/Hostname_validation). If you are having hostname matching problems, then its surely coming from Mechanize at this point in time. SNI is a TLS feature, so be sure you are using TLS 1.0 or above. I'm guessing your problem is Mecahanize (or Ruby) is *not* using SNI. – jww Jul 18 '16 at 03:12
  • @jww There doesn't seem to be a way to set it in Mechanize. Are you aware of a way to do so or alternates to Mechanize that support SNI? – Luke Keller Jul 18 '16 at 03:19
  • 1
    Maybe related (I'm not a Ruby or Mechanize developer): [How to set TLS context options in Ruby (like OpenSSL::SSL::SSL_OP_NO_SSLv2)](http://stackoverflow.com/q/22550213) and [OpenSSL::SSL::SSLError: hostname does not match the server certificate](http://stackoverflow.com/q/23190868). The OpenSSL function that needs to be called is [`SSL_set_tlsext_host_name`](http://wiki.openssl.org/index.php/SSL/TLS_Client), but its not clear to me if/when Ruby calls it. – jww Jul 18 '16 at 03:25
  • @jww it looks like the hostname needs to be set in Net::HTTP (a gem Mechanize depends on), but there doesn't seem to be a way to pass it to Mechanize as an argument. – Luke Keller Jul 18 '16 at 04:01

1 Answers1

1

You don't need to specify the hostname or check the hostname with Mechanize.

Ruby's Net::HTTP handles it for you:

https://github.com/ruby/ruby/blob/trunk/lib/net/http.rb#L928

An OpenSSL::SSL::SSLError exception will be raised if there's a hostname mismatch.

Tim Craft
  • 41
  • 2
  • Interesting. So the hostname mismatch issue happens randomly and I heard that SNI issues can cause that. If hostname specification isn't the issue, do you know how I might go about debugging this problem? – Luke Keller Jul 21 '16 at 18:58
  • If you want to eliminate Ruby you can use openssl to debug it. A correct SNI setup should fail with `openssl s_client -connect example.com:443` and then succeed if you specify `-servername example.com`. If it fails consistently or intermittently then that's an issue with the server. Assuming you don't control the server you could get in touch with the owner/admin and ask them to fix their SSL setup. – Tim Craft Jul 25 '16 at 10:12
  • It seems to only fail intermittently in Mechanize. I did notice an error when using openssl though: `verify error:num=19:self signed certificate in certificate chain`. Would this cause intermittent failures in Mechanize? – Luke Keller Jul 26 '16 at 20:41