Programmatically create subclasses

Question

I'm using Scrapy to crawl a set of similar pages (webcomics). Because these pages are very similar, I wrote a class called ComicCrawler which contains all the spider logic and some class variables (start_url, next_selector, etc.). I then override these class variables in concrete classes for each spider.

Manually creating classes for each comic is cumbersome. I now want to specify the attributes in a JSON file and create the classes during runtime (ie. apply the factory pattern (?)) How do I best go about that?

Alternatively: Is there a way to run a spider without creating a class for it? Edit: The core problem seems to be that Scrapy uses classes, not instances for its spiders. Otherwise I'd just make the class variables instance variables and be done with it.

Example:

class ComicSpider(Spider):
  name = None
  start_url = None
  next_selector = None
  # ...

  # this class contains much more logic than shown here

  def start_requests(self):
    # something including / along the lines of...
    yield Request (self.start_url, self.parse)

  def parse(self, response):
    # something including / along the lines of...
    yield Request(response.css(self.next_selector).get(), self.parse)

in another file:

class SupernormalStep(ComicSpider):
  name = "SupernormalStep"
  start_url = "https://supernormalstep.com/archives/8"
  next_selector = "a.cc-next"

what I want:

myComics = {
  "SupernormalStep": {
    "start_url": "https://supernormalstep.com/archives/8",
    "next_selector": "a.cc-next"
  }, # ...
}

process = CrawlerProcess(get_project_settings())
for name, attributes in myComics:
  process.crawl(build_process(name, attributes))

PS: I crawl responsibly.

Is the attributes in your class are dynamic? – not 0x12 Mar 12 '19 at 21:00 — not 0x12, Mar 12 '19 at 21:00
No, they are defined statically, probably in a JSON file. – azrael Mar 12 '19 at 21:12 — azrael, Mar 12 '19 at 21:12

chepner · Accepted Answer · 2019-03-12T21:55:25.263

3

The class statement is a declarative wrapper around using type directly. Assuming process.crawl takes a class as an argument,

process = CrawlerProcess(get_project_settings())
for name, attributes in myComics.items():
    process.crawl(type(name, (ComicSpider,), attributes))

type(name, (ComicSpider,), attributes) will create a class with name name, that will inherit from ComicSpider and will have attributes as defined in the attributes dictionary. An example on Python docs.

edited Mar 12 '19 at 21:55

answered Mar 12 '19 at 21:05

chepner

497,756
71
530
681

You are right, this is much simpler than creating a metaclass. – handras Mar 12 '19 at 21:30
I added `.items()` to `myComics` but other than that it does exactly what I want. – azrael Mar 12 '19 at 21:43

handras · Answer 2 · 2019-03-12T21:36:07.183

0

Look up metaclasses. This is the way in Python to dynamically create new classes. What are metaclasses in Python?

For this simpler case there is a simpler method which is described in chepner's answer.

edited Mar 12 '19 at 21:36

answered Mar 12 '19 at 20:56

handras

1,548
14
28

Thanks! This seems to be what I need. I'll type up my implementation as a separate answer once I figure it out. – azrael Mar 12 '19 at 21:04
1

Metaclasses would seem to be overkill for this task. – chepner Mar 12 '19 at 21:07

Programmatically create subclasses

2 Answers2