3

The scrapyd docs include the following note:

scrapyd-deploy won’t deploy anything outside the project module...

Does that mean that I cannot import from site-packages in my spiders?

My spiders rely on external libraries such as MySQL-python and tldextract. Must I include these libraries within the project module and import from the include libraries as opposed to site-packages?

chishaku
  • 4,577
  • 3
  • 25
  • 33

2 Answers2

0

I think Deploying your project documentation paragraph should clarify things:

Finally, to deploy your project use:

scrapyd-deploy scrapyd -p project1

This will eggify your project and upload it to the target, printing the JSON response returned from the Scrapyd server. If you have a setup.py file in your project, that one will be used. Otherwise a setup.py file will be created automatically (based on a simple template) that you can edit later.

In other words, you would/should have MySQL-python, tldextract or other dependencies listed in setup.py which would be automatically installed during deployment.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

I found some evidence (i.e. gitHub posts here and here) that suggests installation of custom packages via the 'setup.py' file is not (and will not) be supported. Since I'm running scrapyd in a docker container, my workaround is:

  1. Ensure the necessary external Python packages are installed in the scrapyd container by calling pip install <package> in the Dockerfile.
  2. Create a bind mount in the container which links to any custom modules external to the scrapy project directory. I entered the below lines for my scrapyd service in my docker-compose file (note that the mount point must be created in the container's '/tmp' directory):
volumes:         
  - custom_module:/tmp/custom_module
joel
  • 66
  • 1
  • 5