3

I am using Anemone. How do I crawl sub-domain too? for e.g if I have website www.abc.com my crawler should also crawl support.abc.com or blah.abc.com. I am using Ruby 1.8.7 and Rails 3.

David J.
  • 31,569
  • 22
  • 122
  • 174
Bhushan Lodha
  • 6,824
  • 7
  • 62
  • 100

2 Answers2

4

Here is a commit on Github that solves your problem.

https://github.com/runa/anemone/commit/91559bde052956cfc40ae62678ec2a61574cf928

Change your anemone gem files as per the link.

sunnyrjuneja
  • 6,033
  • 2
  • 32
  • 51
-2

According to the Anemone docs you can pass multiple sites into the crawl command:

Anemone.crawl("http://www.abc.com/", "http://support.abc.com/", "http://blah.abc.com/")

Of course, your next problem will probably be ABC banning you for crawling their site, but that's a different question.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • what if I don't know sub-domains? – Bhushan Lodha Feb 16 '12 at 06:35
  • If you don't know the subdomains you will have to try to locate them by searching through the links retrieved from the first page, looking for other sites that are sub-domains, or that appear to be sibling-domains, of the starting one. Then spawn secondary crawls. – the Tin Man Feb 17 '12 at 18:57