Matching URL structures with Anemone

Question

Right now, I'm doing the following with Anemone:

Anemone.crawl("http://www.findbrowsenodes.com/", :delay => 3) do |anemone|
  anemone.on_every_page do | page |

But I would like to do

Anemone.crawl("http://www.findbrowsenodes.com/", :delay => 3) do |anemone|
   anemone.on_pages_like() do | page |

instead to only crawl from URLs like this:

Any ideas how?

score 3 · Answer 1 · answered Sep 04 '13 at 10:58

3

you can use a regular expression like this

/http:\/\/www.findbrowsenodes.com\/us\/.+\/[\d]*/

answered Sep 04 '13 at 10:58

rhernando

Thanks it worked! But just one thing, at the beginning it includes this URL: `http://www.findbrowsenodes.com/us/p/what-are-browse-nodes` How can I modify the regex to avoid that? – alexchenco Sep 05 '13 at 02:11

1 Answers1