4

I have been trying to get a simple spider to run with scrapy, but keep getting the error:

Could not find spider for domain:stackexchange.com

when I run the code with the expression scrapy-ctl.py crawl stackexchange.com. The spider is as follow:

from scrapy.spider import BaseSpider
from __future__ import absolute_import


class StackExchangeSpider(BaseSpider):
    domain_name = "stackexchange.com"
    start_urls = [
        "http://www.stackexchange.com/",
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

SPIDER = StackExchangeSpider()`

Another person posted almost the exact same problem months ago but did not say how they fixed it, Scrapy spider is not working I have been following the turtorial exactly at http://doc.scrapy.org/intro/tutorial.html, and cannot figure out why it is not working.

When I run this code in eclipse I get the error

Traceback (most recent call last): File "D:\Python Documents\dmoz\stackexchange\stackexchange\spiders\stackexchange_spider.py", line 1, in <module> from scrapy.spider import BaseSpider ImportError: No module named scrapy.spider

I cannot figure out why it is not finding the base Spider module. Does my spider have to be saved in the scripts directory?

Community
  • 1
  • 1
Kristin
  • 89
  • 3
  • 9
  • My spider does not have any rule statements in it so I do not think that applies. I could be wrong though. – Kristin May 22 '10 at 01:04
  • Do you get the same error when using the DmozSpider defined in the tutorial? – unutbu May 22 '10 at 01:36
  • I have not tried it with dmoz as the domain. all I really did was change the site it crawls. – Kristin May 22 '10 at 01:46
  • Where did you place the code posted above? I suspect it is not being found in the project/spiders directory, and it is not getting imported. If it was, you'd get an error saying that the `from __future__ import absolute_import` line has to come before the other import line. – unutbu May 22 '10 at 02:34

1 Answers1

2

try running python yourproject/spiders/domain.py to see if there are any syntax error. I don't think you should enable absolute import as scrapy relies on relatives imports.

R. Max
  • 6,624
  • 1
  • 27
  • 34
  • 1
    It says it cannot find tbe scrapy.spider module – Kristin May 23 '10 at 17:03
  • ya, first issue says `Could not find spider for domain:stackexchange.com` which is a scrapy message, therefore scrapy module loads correctly. And the latter issue is related to eclipse and pythonpath. – R. Max May 23 '10 at 21:23
  • 1
    Problem fixed. Reinstalled on another computer. Must have had files misplaced or it installed wrong. – Kristin May 25 '10 at 22:53