10

I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?

http://yoursite/page/38475 #=> page number 38475 doesn't exist

I tried the following which didn't work.

url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
  begin
    rescue Exception => e
      puts "Try again later"
  end
end
Bala
  • 11,068
  • 19
  • 67
  • 120

1 Answers1

24

It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

BTW, about rescuing Exception: Why is it a bad style to `rescue Exception => e` in Ruby?

Community
  • 1
  • 1
Marek Lipka
  • 50,622
  • 7
  • 87
  • 91