0

I'm using Scrapy to crawl a set of similar pages (webcomics). Because these pages are very similar, I wrote a class called ComicCrawler which contains all the spider logic and some class variables (start_url, next_selector, etc.). I then override these class variables in concrete classes for each spider.

Manually creating classes for each comic is cumbersome. I now want to specify the attributes in a JSON file and create the classes during runtime (ie. apply the factory pattern (?)) How do I best go about that?

Alternatively: Is there a way to run a spider without creating a class for it? Edit: The core problem seems to be that Scrapy uses classes, not instances for its spiders. Otherwise I'd just make the class variables instance variables and be done with it.


Example:

class ComicSpider(Spider):
  name = None
  start_url = None
  next_selector = None
  # ...

  # this class contains much more logic than shown here

  def start_requests(self):
    # something including / along the lines of...
    yield Request (self.start_url, self.parse)

  def parse(self, response):
    # something including / along the lines of...
    yield Request(response.css(self.next_selector).get(), self.parse)

in another file:

class SupernormalStep(ComicSpider):
  name = "SupernormalStep"
  start_url = "https://supernormalstep.com/archives/8"
  next_selector = "a.cc-next"

what I want:

myComics = {
  "SupernormalStep": {
    "start_url": "https://supernormalstep.com/archives/8",
    "next_selector": "a.cc-next"
  }, # ...
}

process = CrawlerProcess(get_project_settings())
for name, attributes in myComics:
  process.crawl(build_process(name, attributes))

PS: I crawl responsibly.

azrael
  • 100
  • 1
  • 11

2 Answers2

3

The class statement is a declarative wrapper around using type directly. Assuming process.crawl takes a class as an argument,

process = CrawlerProcess(get_project_settings())
for name, attributes in myComics.items():
    process.crawl(type(name, (ComicSpider,), attributes))

type(name, (ComicSpider,), attributes) will create a class with name name, that will inherit from ComicSpider and will have attributes as defined in the attributes dictionary. An example on Python docs.

chepner
  • 497,756
  • 71
  • 530
  • 681
0

Look up metaclasses. This is the way in Python to dynamically create new classes. What are metaclasses in Python?

For this simpler case there is a simpler method which is described in chepner's answer.

handras
  • 1,548
  • 14
  • 28