4

http://www.example.com/books?_pop=mheader

What would be the regular expression to match this and any URL that has "books" in the URLs as one of the pattern matches ? This site has a books category and various other sub-categories under that. How do I traverse down to search all the URLs for book ?

require 'anemone'
Pattern = %r[(\/books)*]
Anemone.crawl("http://www.example.com/") do |anemone|
  anemone.on_pages_like(Pattern) do |page|
    puts page.url
  end
end
Aayush
  • 1,244
  • 5
  • 19
  • 48
  • If you're using `%r[...]` then you won't need to backslashify your slashes. Also note that constants like your pattern should be `ALL_CAPS` and classes should be `MixedCase`. – tadman Sep 07 '12 at 06:01

2 Answers2

3

http://rubular.com/ is a useful tool to test regex for Ruby.

The regex would be simple, /http:\/\/.+(books)/. It matchs http:// as well to help ensure it is a url. Here is a rubular test against http://www.example.com/reference-books-2300.

Aayush
  • 1,244
  • 5
  • 19
  • 48
mguymon
  • 8,946
  • 2
  • 39
  • 61
1

The pattern to match /books in your url should just be "/books"

This is a good site to test your regular expressions http://regexpal.com to ensure you have at least that part of your code right.

Wolfsokta
  • 71
  • 4