2

I am trying to make a Scrapy custom project command to run spiders. I found Register commands via setup.py entry points and did the following:

  1. mkdir commands

  2. cd commands

  3. Created the command file crawlall.py:

    from scrapy.commands import ScrapyCommand
    from scrapy.utils.project import get_project_settings
    from scrapy.crawler import Crawler
    
    class Command(ScrapyCommand):
    
        requires_project = True
    
        def syntax(self):
            return '[options]'
    
        def short_desc(self):
            return 'Runs all of the spiders'
    
        def run(self, args, opts):
            settings = get_project_settings()
    
            for spider_name in self.crawler.spiders.list():
                crawler = Crawler(settings)
                crawler.configure()
                spider = crawler.spiders.create(spider_name)
                crawler.crawl(spider)
                crawler.start()
    
            self.crawler.start()
    
  4. Added COMMANDS_MODULE = 'myprojectname.commands' to the settings.py.

  5. Created the setup.py:

    from setuptools import setup, find_packages
    
    setup(name='scrapy-mymodule',
      entry_points={
        'scrapy.commands': [
          'crawlall=cnblogs.commands:crawlall',
        ],
      },
     )
    
  6. Ran the project command with scrapy crawlall, which threw the following error:

    Traceback (most recent call last):
      File "/usr/local/bin/scrapy", line 9, in <module>
        load_entry_point('Scrapy==1.0.0rc2', 'console_scripts', 'scrapy')()
      File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 122, in execute
        cmds = _get_commands_dict(settings, inproject)
      File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 50, in _get_commands_dict
        cmds.update(_get_commands_from_module(cmds_module, inproject))
      File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 29, in _get_commands_from_module
        for cmd in _iter_command_classes(module):
      File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py", line 20, in _iter_command_classes
        for module in walk_modules(module_name):
      File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/utils/misc.py", line 63, in walk_modules
        mod = import_module(path)
      File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module
        __import__(name)
    ImportError: No module named commands
    

What should I do? Where is my mistake?

J0e3gan
  • 8,740
  • 10
  • 53
  • 80
jack
  • 21
  • 2

3 Answers3

2

In order to make a module detectable, add a __init__.py file in the commands directory:

> pwd                # make sure that you are in commands directory
.../commands/

> touch __init__.py  # create __init__.py

See more info in another SO thread: What is __init__.py for?

Community
  • 1
  • 1
Jon
  • 11,356
  • 5
  • 40
  • 74
0

A directory with a Python script does not an importable module maketh: you need to add an __init__.py file to the commands directory as Python documentation on modules and packages explains:

The __init__.py files are required to make Python treat the directories as containing packages....

The __init__.py file can be empty.

Also, commands needs to be in a directory on sys.path if it is not already for Python to find it, as the aforementioned documentation further explains:

When importing [a] package, Python searches through the directories on sys.path looking for the package subdirectory.

The following Python snippet will display your sys.path:

import sys
sys.path

Lastly, read a particularly relevant SO answer in the thread to which Jon referred you for more information.

Community
  • 1
  • 1
J0e3gan
  • 8,740
  • 10
  • 53
  • 80
0

The official documentation says [1] that you have write the full path to your command class:

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'crawlall=cnblogs.commands.crawlall:Command',
    ],
  },
 )

where:

  • Command: Your command class -> Command(ScrapyCommand):
  • crawlall: Your *.py file where the command class is.

You can also add Scrapy commands from an external library by adding a scrapy.commands section in the entry points of the library setup.py file.

The following example adds my_command command: from setuptools import setup, find_packages

setup(name='scrapy-mymodule',
  entry_points={
    'scrapy.commands': [
      'my_command=my_scrapy_module.commands:MyCommand',
    ],
  },
 )
Salva Carrión
  • 510
  • 6
  • 16