3

I’m using Rails 4.2.3 and Nokogiri to get data from a web site. I want to perform an action when I don’t get any response from the server, so I have:

begin
  content = open(url).read
  if content.lstrip[0] == '<'
    doc = Nokogiri::HTML(content)
  else
    begin
      json = JSON.parse(content)
    rescue JSON::ParserError => e
      content
    end
  end
rescue Net::OpenTimeout => e
  attempts = attempts + 1
  if attempts <= max_attempts
    sleep(3)
    retry
  end
end

Note that this is different than getting a 500 from the server. I only want to retry when I get no response at all, either because I get no TCP connection or because the server fails to respond (or some other reason that causes me not to get any response). Is there a more generic way to take account of this situation other than how I have it? I feel like there are a lot of other exception types I’m not thinking of.

Azolo
  • 4,353
  • 1
  • 23
  • 31
Dave
  • 15,639
  • 133
  • 442
  • 830
  • You're not using Nokogiri to get data, you're using it to _parse_ data. OpenURI "gets" the data. That's an important distinction that removes Nokogiri from the question. Also, the title is misleading; Rails is a framework written in Ruby. You don't write things in Rails, you write them in Ruby, and sometimes use Rails' methods. I'd suggest rewording the question based on that knowledge. There are many defined HTTP errors you can handle, and there _can_ be custom ones defined by the administrators of a site so you'll have to be aware of those. – the Tin Man Jul 14 '16 at 23:39
  • I don't care about any custom messages defined by a site -- that would imply a response is being sent back. I'm trying to account for the situation (and only the situation) where I get no response back at all? Is it clear what I'm asking -- getting no response vs getting responses indicating other conditions? – Dave Jul 15 '16 at 14:07
  • You're concerned about not getting a TCP connection, or getting a TCP connection but the server doesn't respond. – the Tin Man Jul 15 '16 at 18:42
  • Ah good distinction. Yes, I'm concerned about both cases -- not getting a TCP connection or getting a TCP connection and then not hearing back from the server. I will update my question to reflect this. – Dave Jul 19 '16 at 15:43
  • do you know the [ultimate guide to ruby timeouts](https://github.com/ankane/the-ultimate-guide-to-ruby-timeouts)? specifically the http section. Btw opentimeout and readtimeout seem to be the ones to catch. – Axe Jul 23 '16 at 09:44

3 Answers3

5

This is generic sample how you can define timeout durations for HTTP connection, and perform several retries in case of any error while fetching content (edited)

require 'open-uri'
require 'nokogiri'

url = "http://localhost:3000/r503"

openuri_params = {
  # set timeout durations for HTTP connection
  # default values for open_timeout and read_timeout is 60 seconds
  :open_timeout => 1,
  :read_timeout => 1,
}

attempt_count = 0
max_attempts  = 3
begin
  attempt_count += 1
  puts "attempt ##{attempt_count}"
  content = open(url, openuri_params).read
rescue OpenURI::HTTPError => e
  # it's 404, etc. (do nothing)
rescue SocketError, Net::ReadTimeout => e
  # server can't be reached or doesn't send any respones
  puts "error: #{e}"
  sleep 3
  retry if attempt_count < max_attempts
else
  # connection was successful,
  # content is fetched,
  # so here we can parse content with Nokogiri,
  # or call a helper method, etc.
  doc = Nokogiri::HTML(content)
  p doc
end
Zoran Majstorovic
  • 1,549
  • 16
  • 17
  • This doesn't answer my question exactly. You are catching an exception for any type of exception thrown, even 404s or 503s, which are responses from teh server. I want to account for hte cases (and only the cases) where the server can't be reached or doesn't send any respones at all. – Dave Jul 20 '16 at 21:15
  • @Dave Your question is a little ambiguous on how to handle everything else, but this close to the right answer. You want to be rescuing`SocketError`s and instead of `Net::OpenTimeout` you want to catch `Net::ReadTimeout`. `Net::OpenTimeout` only catches if fail to open the connection not if we fail to understand/read a response. Just don't rescue `OpenURI::HTTPError` if you don't care about the other errors. – Azolo Jul 20 '16 at 23:44
  • Dave I've updated the code after your first comment (to show more granular exception handling). As @Azolo said, you can customize it according to your actual needs. – Zoran Majstorovic Jul 21 '16 at 08:41
5

When it comes to rescuing exceptions, you should aim to have a clear understanding of:

  • Which lines in your system can raise exceptions
  • What is going on under the hood when those lines of code run
  • What specific exceptions could be raised by the underlying code

In your code, the line that's fetching the content is also the one that could see network errors:

content = open(url).read

If you go to the documentation for the OpenURI module you'll see that it uses Net::HTTP & friends to get the content of arbitrary URIs.

Figuring out what Net::HTTP can raise is actually very complicated but, thankfully, others have already done this work for you. Thoughtbot's suspenders project has lists of common network errors that you can use. Notice that some of those errors have to do with different network conditions than what you had in mind, like the connection being reset. I think it's worth rescuing those as well, but feel free to trim the list down to your specific needs.

So here's what your code should look like (skipping the Nokogiri and JSON parts to simplify things a bit): require 'net/http' require 'open-uri'

HTTP_ERRORS = [
  EOFError,
  Errno::ECONNRESET,
  Errno::EINVAL,
  Net::HTTPBadResponse,
  Net::HTTPHeaderSyntaxError,
  Net::ProtocolError,
  Timeout::Error,
]
MAX_RETRIES = 3

attempts = 0

begin
  content = open(url).read
rescue *HTTP_ERRORS => e
  if attempts < MAX_RETRIES
    attempts += 1
    sleep(2)
    retry
  else
    raise e
  end
end
Eugen Minciu
  • 121
  • 4
1

I would think about using a Timeout that raises an exception after a short period:

MAX_RESPONSE_TIME = 2 # seconds
begin
  content = nil # needs to be defined before the following block
  Timeout.timeout(MAX_RESPONSE_TIME) do  
    content = open(url).read
  end

  # parsing `content`
rescue Timeout::Error => e
  attempts += 1
  if attempts <= max_attempts
    sleep(3)
    retry
  end
end
spickermann
  • 100,941
  • 9
  • 101
  • 131
  • Thanks. What about if the DNS doesn't resolve for the host in question -- will the above account for that? – Dave Jul 20 '16 at 21:16
  • `Timeout.timeout` raises an exception when the code within the block takes longer that `MAX_RESPONSE_TIME` to run - no matter why it took longer. If the DNS takes too long than my example would cover that case. If the DNS fails with another exception, then you will need to rescue from that Exception too (Sorry, I am not sure what exception is raised in that case). – spickermann Jul 20 '16 at 21:23
  • This isn't a bad solution, the `Net::XTimeout` errors actually result from the internal use of `Timeout` in the `Net` module. However, what this doesn't take into account is the fact if you have a large page `open-uri` parses it and loads it all into memory, which I've seen take a long amount of time. – Azolo Jul 20 '16 at 23:25
  • Be wary of Timeout since it does not always work as expected. Especially with concurrent code. – Axe Jul 23 '16 at 09:52