3

So I'm trying to write functions that can be called upon from all scrapy spiders. Is there one place in my project where I can just define these functions or do I need to import them in each spider?

Thanks

Casper
  • 1,435
  • 10
  • 22

1 Answers1

4

You can't implicitly import code (at least not without hacking around) in python, after all explicit is better than implicit - so it's not a good idea.

However in scrapy it's very common to have base Spider class with common functions and methods.

Lets assume you have this tree:

├── myproject
│   ├── __init__.py
│   ├── spiders
│   │   ├── __init__.py
│   │   ├── spider1.py
│   │   ├── spider2.py
├── scrapy.cfg

We can create a base spider in spiders/__init__.py:

class BaseSpider(Spider):
    def common_parse(self, response):
        # do something     

And inherit from it in your spiders:

from myproject.spiders import BaseSpider
class Spider1(BaseSpider):
    def parse(self, response):
        # use common methods!
        if 'indicator' in response.body:
            self.common_parse(response)
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
  • Thanks @Granitosaurus but I'm getting an error while trying this: `ImportError: cannot import name BaseSpider` while using the name of the folder structure as you showed. Tried to play around with the replacement of `myproject` and when I'm using `from scrapy.spiders import BaseSpider` I'm able to run the spider but it doesn't find the function. Any suggestions on where I could've made a mistake? – Casper May 26 '17 at 10:59
  • @Casper are you sure your tree structure is correct? It might be that you either need to install your own package or update your pathonpath with location where your module is so python can actually find the imports. Similar question: https://stackoverflow.com/questions/21352669/python-path-explained-import-from-a-subpackage – Granitosaurus May 27 '17 at 06:56
  • Turned out I was not working in the correct `__init__.py`. Thanks for the clear explanation! – Casper May 27 '17 at 08:12