0

I have a simple script which checks for bad url's:

def self.check_prod_links
  require 'net/http'
  results = []
  Product.find_each(:conditions =>{:published => 1}) do |product|
    url = product.url 
    id = product.id
    uri = URI(url)
    begin
      response = Net::HTTP.get_response(uri)
    rescue
      begin
        http = Net::HTTP.new(uri.host, uri.port)
        http.use_ssl = true
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE
        request = Net::HTTP::Get.new(uri.request_uri)
        response = http.request(request)
      rescue
        begin
          response = Net::HTTP.get_response("http://" + uri)  
        rescue => e
          p "Problem getting url: #{url} Error Message: #{e.message}"
        end
      end
    end
    p "Checking URL = #{url}. ID = #{id}. Response Code = #{response.code}" 
    unless response.code.to_i == 200
      product.update_attribute(:published, 0) 
      results << product
    end
  end
  return results
end

How can I allow incorrectly formatted urls eg: hkbfksrhf.google.com to not crash the script with the following error:

getaddrinfo: nodename nor servname provided, or not known

I just want the task to run till the end, and print any/all errors that are not a 200 and 301 http response.

Thanks!

georgebrock
  • 28,393
  • 13
  • 77
  • 72
Yogzzz
  • 2,735
  • 5
  • 36
  • 56

1 Answers1

1

Is open-uri an option? It throws an exception when 404s or 500s (or other HTTP exceptions) are encountered, in addition to SocketErrors, which allows you to clean up your code a bit

def self.check_prod_links                                            
  require 'open-uri'                                                 
  results = []                                                       

  Product.where(:published => 1).each do |product|                   
    url = product.url                                               
    id = product.id                                                  
    failed = true                                                    

    begin                                                            
      open URI(url)                                                  
      failed = false                                                 
    rescue OpenURI::HTTPError => e                                   
      error_message = e.message                                      
      response_message = "Response Code = #{e.io.status[0]}"         
    rescue SocketError => e                                          
      error_message = e.message                                      
      response_message = "Host unreachable"                          
    rescue => e                                                      
      error_message = e.message                                      
      response_message = "Unknown error"                             
    end                                                              

    if failed                                                        
      Rails.logger.error "Problem getting url: #{url} Error Message: #{error_message}"
      Rails.logger.error "Checking URL = #{url}. ID = #{id}. #{response_message}".    

      product.update_attribute(:published, 0).                       
      results << product                                             
    end                                                              
  end                                                                

  results                                                          
end                                                                  
Angelo
  • 492
  • 3
  • 5
  • Im running this method through a rake task. Do you know of a way I can continue running the task even if a URL is invalid and just log the bad url and continue? I get the following error when a bad url is being checked: "Problem getting url: http://bkjfbjkbfjkwbfjrwkbf.gifts.redenvelope.com/orchids/Purple-Dendrobium-Orchids-3755?ref=HomeNoRef&viewpos=19&trackingpgroup=rbdbs Error Message: getaddrinfo: nodename nor servname provided, or not known" rake aborted! – Yogzzz Jul 20 '12 at 05:01
  • Did you try the updated method I provided? It should catch the SocketError that outputs your "getaddrinfo" message and continue. If you want to log to a file, you can use a logger (e.g `Rails.logger.error "#{response_message}"`) instead of outputting to stdout using `p` – Angelo Jul 20 '12 at 05:16
  • Yup, Im still getting errors for invalid URL's even with the method you provided. Thank you very much for your help – Yogzzz Jul 20 '12 at 05:24
  • That is odd considering the last generic rescue should catch anything unexpected. I suspect there may be a callback in your Product model that may be trying to do something and throwing an exception. Can you try commenting `# product.update_attribute(:published, 0)` to see if the rake task is still throwing an exception? – Angelo Jul 20 '12 at 05:33
  • Still getting the errors. I removed the http:// from the first record in my db, and ran the method, and this was the exception I received: "Problem getting url: www.gifts.redenvelope.com/flowers/Chocolate-Vanilla--Red-Velvet-Cupcakes--12-Cou-and-other-chocolates--gifts--Sharis-Berries-30006042?ref=HomeNoRef&viewpos=8&trackingpgroup=rbdbs Error Message: can't convert URI::Generic into String" rake aborted! Also still doesnt work with invalid urls. – Yogzzz Jul 20 '12 at 05:36
  • Do you know of anyway to catch these errors in the rake task itself, and save the errors there while allowing the method to continue without aborting, because I suspect that just a handful of URL's are formatted incorrectly in my db. – Yogzzz Jul 20 '12 at 05:41
  • That appears to be what we're outputting to standard out using the `p` command, i.e. the exception we caught. It shouldn't prevent the task from completing. I've changed the script to log to the rails log file instead of outputting to stdout. – Angelo Jul 20 '12 at 05:54