1

I'm trying to be able to have a global exception capture where I can add extra information when an error happens. I have two classes, "crawler" and "amazon". What I want to do is be able to call "crawl", execute a function in amazon, and use the exception handling in the crawl function.

Here are the two classes I have:

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def crawl
    puts "crawling"

    begin
      #execute code in Amazon class here?
    rescue Exception => e
      puts "Exception: #{e.message}"
      puts "On url: #{@current_url}"
      puts e.backtrace
    end
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

class Amazon < Crawler
  #some code with errors
  def stuff
    page = get("http://www.amazon.com")
    puts page.parser.xpath("//asldkfjasdlkj").first['href']
  end
end

a = Amazon.new
a.crawl

Is there a way I can call "stuff" inside of "crawl" so I can use that exception handling over the entire stuff function? Is there a better way to accomplish this?

EDIT: This is when I ended up doing

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def crawl
    yield
  rescue Exception => e
    puts "Exception: #{e.message}"
    puts "On url: #{@current_url}"
    puts e.backtrace
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

c = Crawler.new

c.crawl do
  page = c.get("http://www.amazon.com")
  puts page.parser.xpath("//asldkfjasdlkj").first['href']
end
AdamB
  • 3,101
  • 4
  • 34
  • 44

3 Answers3

0

I managed to get the desired functionality with "super" and a placeholder function. Is there still a better way to do this?

require 'mechanize'

class Crawler
  Mechanize.html_parser = Nokogiri::HTML

  def initialize
    @agent = Mechanize.new
  end

  def stuff
  end

  def crawl
    stuff
  rescue Exception => e
    puts "Exception: #{e.message}"
    puts "On url: #{@current_url}"
    puts e.backtrace
  end

  def get(url)
    @current_url = url
    @agent.get(url)
  end
end

class Amazon < Crawler
  #some code with errors
  def stuff
    super
    page = get("http://www.amazon.com")
    puts page.parser.xpath("//asldkfjasdlkj").first['href']
  end
end

a = Amazon.new
a.crawl
AdamB
  • 3,101
  • 4
  • 34
  • 44
0

You could have crawl accept a code block:

def crawl
  begin
    yield
  rescue Exception => e
    # handle exceptoin
  end
end

def stuff
  crawl do
    # implementation of stuff
  end
end

I'm not crazy about having a method with no body. A code block might make more sense here. May also eliminate the need for subclassing depending on what you want to do.

Alex Rockwell
  • 1,454
  • 9
  • 13
0

If you want it another way, take a look at "strategy" design patten:

# test_mach.rb
require 'rubygems'
require 'mechanize'

# this is the context class,which calls the different strategy implementation
class Crawler
  def initialize(some_website_strategy)
    @strategy = some_website_strategy
  end

  def crawl
    begin
      @strategy.crawl
      #execute code in Amazon class here?
    rescue Exception => e
      puts "==== starts this exception comes from Parent Class"
      puts e.backtrace
      puts "==== ends  this exception comes from Parent Class"
    end
  end
end

# strategy class for Amazon
class Amazon
  def crawl
    puts "now crawling amazon"
    raise "oh ... some errors when crawling amazon"
  end
end

# strategy class for taobao.com 
class Taobao
  def crawl
    puts "now crawling taobao"
    raise "oh ... some errors when crawling taobao"
  end
end

then run this code:

amazon = Crawler.new(Amazon.new)
amazon.crawl
taobao = Crawler.new(Taobao.new)
taobao.crawl

result:

now crawling amazon
==== starts this exception comes from Parent Class
test_mach.rb:27:in `crawl'
test_mach.rb:13:in `crawl'
test_mach.rb:38
==== ends  this exception comes from Parent Class
now crawling taobao
==== starts this exception comes from Parent Class
test_mach.rb:34:in `crawl'
test_mach.rb:13:in `crawl'
test_mach.rb:40
==== ends  this exception comes from Parent Class

BTW. for your implementation, basically I did the same way with you. except

# my implementation
class Crawler
  def stuff
    raise "abstract method called"
  end
end 

If you want it another way, take a look at "around alias" (<< metaprogramming ruby>>, page 155 ) . However I think "around alias" is the revert case from strategy.

( I mean,

  • in strategy pattern, you at first implement the "context" class, then you need to implement the "strategy" class.
  • but in "around alias" spell, you need to first of all get an "strategy" implemented, then you write a new class that "around alias" the "strategy"...

err... hope I didn't make you confused ^_^ )

Siwei
  • 19,858
  • 7
  • 75
  • 95