0

I have some code that loads a web document using nokogiri:

require 'nokogiri'
require 'open-uri'
require 'openssl'
require 'net/https'

define_method (:loadWebDoc) { |url|
  web_doc = nil
  begin
    file = open(url)
    web_doc = Nokogiri::HTML(file)
  rescue OpenURI::HTTPError => ex
    raise ex
  end  
  web_doc  
}

#process some urls with threads...

It's always worked well, until I started using it in threads. My script calls loadWebDoc many times successfully, but after about 30 seconds of processing documents, I get an error like this:

/System/Library/Frameworks/Ruby.framework/Versions/2.3/usr/lib/ruby/2.3.0/net/protocol.rb:44:in `connect_nonblock': SSL_connect SYSCALL returned=5 errno=0 state=SSLv3 read server session ticket A (OpenSSL::SSL::SSLError)

Here is a similar issue on stack which suggests to use TLSv1, but it's using the stock http and not Nokogiri.

I've tried several variations of something like:
file = open(url, :ssl_version => OpenSSL::SSL::SSLContext::TLSv1)

but this just gives me errors like
uninitialized constant OpenSSL::SSL::SSLContext::TLSv1 (NameError)

How can I force Nokogiri to do the same thing? It looks like I need to configure the ssl version and cipher(s) but I'm not sure how with Nokogiri and I'm likely using the wrong constant.

kraftydevil
  • 5,144
  • 6
  • 43
  • 65

1 Answers1

1

Looks like the error 'connect_nonblock' is raised coz the server can't handle the many connections, especially in the case of threads. Try to add the delay between attempts

open(url, open_timeout: 100)

https://ruby-doc.org/stdlib-2.4.0/libdoc/socket/rdoc/Socket.html#method-i-connect_nonblock

Alex Strizhak
  • 910
  • 1
  • 12
  • 22