2

I am running a web scraper with anemone on ruby and I am giving my server some problems when it visits pages that require a logon.

The pages all have a phrase, say, "account" in the url and I want the program to completely ignore and not go to any link with a destination containing this string.

How can I do this?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Benjamin
  • 551
  • 5
  • 25

1 Answers1

4

Anemone has a skip_links_like method:

skip_links_like(*patterns)
Add one ore more Regex patterns for URLs which should not be followed

So adding something like

skip_links_like /\/account\//

should take care of it:

Anemone.crawl("somesite.co.uk", :depth_limit => 1) do |anemone|
    anemone.skip_links_like /\/account\//
    #...
end
mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • so it would look like this?: Anemone.crawl("http://www.somesite.co.uk", :depth_limit => 1, skip_links_like /\/account\//) do |anemone| – Benjamin Sep 07 '11 at 10:48