2

In Javascript, there are a variety of ways to allow for inheritance of methods. Below is a hybrid example using a few of the approaches:

A = {
    name: 'first',
    wiggle: function() { return this.name + " is wiggling" },
    shake: function() { return this.name + " is shaking" }
}

B = Object.create(A)
B.name = 'second'
B.bop = function() { return this.name + ' is bopping' }


C = function(name) {
    obj = Object.create(B)
    obj.name = name
    obj.crunk = function() { return this.name + ' is crunking'}

    return obj
}

final = new C('third')

This gives me the following inheritance hierarchy.

enter image description here

One of the important things to notice is the name property for each object. When running a method, even one far down the prototype chain, the local context defined by the this keyword ensures that the localmost property/variable is used.

enter image description here

I've recently moved on to Python but I'm having trouble understanding how subclasses access superclass methods, and likewise how variable scoping / object properties work.

I had created a Spider in Scrapy that (quite successfully) scraped 2000+ pages on a single domain and parsed them into a format I need. Lots of the helpers where just functions within the main parse_response method, which I could use directly on the data. The original spider looked something like this:

from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from spider_scrape.items import SpiderItems

class ScrapeSpider(CrawlSpider):

    name              =   "myspider"
    allowed_domains   =   ["domain.com.au"]
    start_urls        =   ['https://www.domain.com.au/']
    rules             =   (Rule(SgmlLinkExtractor(allow=()), 
                                                  callback="parse_items", 
                                                  follow=True), )

    def parse_items(self, response):
        ...

The callback function parse_items contains the logic that deals with the response for me. When I generalised everything, I ended up with the following (with the intent to use this on multiple domains):

#Base

class BaseSpider(CrawlSpider):
    """Base set of configuration rules and helper methods"""

    rules = (Rule(LinkExtractor(allow=()),
                                    callback="parse_response",
                                    follow=True),)

    def parse_response(self, response):
            ...

        def clean_urls(string):
          """remove absolute URL's from hrefs, if URL is form an external domain do nothing"""
          for domain in allowed_domains:
              string = string.replace('http://' + domain, '')
              string = string.replace('https://' + domain, '')
          if 'http' not in string:
              string = "/custom/files" + string
          return string


#Specific for each domain I want to crawl
class DomainSpider(BaseSpider):

    name = 'Domain'
    allowed_domains = ['Domain.org.au']
    start_urls      = ['http://www.Domain.org.au/'
                      ,'http://www.Domain.org.au/1']

When I ran this via the Scrapy command line, I had the following error in the console:

enter image description here

After some testing, changing the list comprehension to this caused it to work: for domain in self.allowed_domains:

All good, that seems mighty similar to the this keyword in Javascript - I'm using properties of the object to get the values. There are many more variables/properties that will hold the required XPath expressions for the scrape:

class DomainSpider(BaseSpider):

    name = 'Domain'
    page_title      =      '//title'
    page_content    =      '//div[@class="main-content"]'

Changing the other parts of the Spider to mimic that of the allowed_domains variable, I received this error:

enter image description here

I tried setting the property differently in a few ways, including using self.page_content and/or an __init__(self) constructor with no success but different errors.

I'm completely lost what is happening here. The behaviour I expect to happen is:

  1. When I run the scrapy crawl <spider name> from the terminal, it instantiates the DomainSpider class
  2. Any class constants in that class become available to all methods that it is has inherited, similar to Javascript and its this keyword
  3. Any class constants from its super class(es) are ignored due to context.

If someone could

  • Explain the above to me
  • Point me to something more meaty than LPTHW but not TDD with Python that would be amazing.

Thanks in advance.

Henrik Andersson
  • 45,354
  • 16
  • 98
  • 92
Jamie S
  • 760
  • 3
  • 9
  • 19
  • 1
    Python is not JavaScript. Have you read through [Section 9](https://docs.python.org/2.7/tutorial/classes.html) of the tutorial? I imagine there a a few videos at pyvideo.org that might shed some light ... http://pyvideo.org/video/1779/pythons-class-development-toolkit, http://pyvideo.org/video/879/the-art-of-subclassing, http://pyvideo.org/video/880/stop-writing-classes, http://pyvideo.org/video/1079/a-deep-dive-into-python-classes – wwii Dec 27 '14 at 07:02
  • 1
    If you're trying to understand how attributes work in Python, I'd suggest you start with simple examples using "toy" classes instead of using Scrapy as an example. Once you understand the concepts you will have an easier time with whatever Scrapy is doing. Look through [the Python tutorial](https://docs.python.org/2/tutorial/) and perhaps some other questions on here and webpages you can find by googling "Python class and instance variables/attributes", like [this one](http://stackoverflow.com/questions/8959097/what-is-the-difference-between-class-and-instance-variables-in-python). – BrenBarn Dec 27 '14 at 07:03

1 Answers1

1

I'm not familiar with JavaScript but questions similar to yours always include an answer that suggests you have to learn the way to do it in Python and not try to force Python to be like your other language. Trying to re-create your Javascript-esque style in Python I came up with this:

class A(object):
    def __init__(self):
        self.name = 'first'
    def wiggle(self):
        return self.name + ' is wiggling'
    def shake(self):
        return self.name + ' is shaking'

Create an instance of A, change its name and add a method attribute to the instance

b = A()
b.name = 'second'
b.bop = lambda : b.name + ' is bopping'

A function that returns an instance of A, with the additional attribute crunk. I don't think this is true to your example, thing will not have a bop method, although another statement in the function could add one.

def c(name):
    thing = A()
    thing.name = name
    thing.crunk = lambda : thing.name + ' is crunking'
    return thing

final = c('third')

There isn't any inheritance going on, just instances of A with additional attributes. You get the following result:

>>> 
>>> b.name
'second'
>>> b.bop()
'second is bopping'
>>> b.shake()
'second is shaking'
>>> b.wiggle()
'second is wiggling'
>>> 
>>> final.name
'third'
>>> final.crunk()
'third is crunking'
>>> final.shake()
'third is shaking'
>>> final.wiggle()
'third is wiggling'
>>> final.bop()

Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    final.bop()
AttributeError: 'A' object has no attribute 'bop'
>>> 

In Python you would do it like this:

Class A with a default argument for the name attribute and two methods that will be bound to an instance of A. name is an instance attribute because it is defined in __init__. Only instances of A will have a name attribute - A.name will raise an AttributeError.

class A(object):
    def __init__(self, name = 'first'):
        self.name = name
    def wiggle(self):
        return self.name + ' is wiggling'
    def shake(self):
        return self.name + ' is shaking'

Foo inherits everything from A and defines an additional attribute bop.

class Foo(A):
    def bop(self):
        return self.name + ' is bopping'

Bar inherits everything from Foo and defines an additional attribute crunk

class Bar(Foo):
    def crunk(self):
        return self.name + ' is crunking'

Baz inherits everything from Bar and overides wiggle

class Baz(Bar):
    def wiggle(self):
        return 'This Baz instance, ' + self.name + ', is wiggling'

foo = Foo('second')
bar = Bar('third')
baz = Baz('fourth')

Usage:

>>> 
>>> foo.name
'second'
>>> foo.bop()
'second is bopping'
>>> foo.shake()
'second is shaking'
>>> foo.wiggle()
'second is wiggling'
>>> 
>>> bar.name
'third'
>>> bar.bop()
'third is bopping'
>>> bar.shake()
'third is shaking'
>>> bar.wiggle()
'third is wiggling'
>>> bar.crunk()
'third is crunking'
>>> 
>>> baz.wiggle()
'This Baz instance, fourth, is wiggling'
>>>

The classes in these examples have method attributes that are only valid for instances of the class - the methods need to be bound to an instance. I didn't include any examples for class methods or static methods which do not need to be bound to an instance - there are some good answers to What is the difference between @staticmethod and @classmethod in Python?

>>> A.wiggle
<unbound method A.wiggle>
>>> A.wiggle()

Traceback (most recent call last):
  File "<pyshell#41>", line 1, in <module>
    A.wiggle()
TypeError: unbound method wiggle() must be called with A instance as first argument (got nothing instead)
>>> Bar.crunk
<unbound method Bar.crunk>
>>> Bar.crunk()

Traceback (most recent call last):
  File "<pyshell#43>", line 1, in <module>
    Bar.crunk()
TypeError: unbound method crunk() must be called with Bar instance as first argument (got nothing instead)
>>> 
Community
  • 1
  • 1
wwii
  • 23,232
  • 7
  • 37
  • 77