1

I'm trying to alter Scrapy's stats middleware.

Here's Scrapy's stats.py in full:

from scrapy.exceptions import NotConfigured
from scrapy.utils.request import request_httprepr
from scrapy.utils.response import response_httprepr

class DownloaderStats(object):

    def __init__(self, stats):
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        if not crawler.settings.getbool('DOWNLOADER_STATS'):
            raise NotConfigured
        return cls(crawler.stats)

    def process_request(self, request, spider):
        self.stats.inc_value('downloader/request_count', spider=spider)
        self.stats.inc_value('downloader/request_method_count/%s' % request.method, spider=spider)
        reqlen = len(request_httprepr(request))
        self.stats.inc_value('downloader/request_bytes', reqlen, spider=spider)

    def process_response(self, request, response, spider):
        self.stats.inc_value('downloader/response_count', spider=spider)
        self.stats.inc_value('downloader/response_status_count/%s' % response.status, spider=spider)
        reslen = len(response_httprepr(response))
        self.stats.inc_value('downloader/response_bytes', reslen, spider=spider)
        return response

    def process_exception(self, request, exception, spider):
        ex_class = "%s.%s" % (exception.__class__.__module__, exception.__class__.__name__)
        self.stats.inc_value('downloader/exception_count', spider=spider)
        self.stats.inc_value('downloader/exception_type_count/%s' % ex_class, spider=spider)

In the from_crawler classmethod, what is it, exactly, that's getting passed in?

scharfmn
  • 3,561
  • 7
  • 38
  • 53
  • 1
    `.from_crawler()` is called (at least) in `scrapy/middleware.py` in `.from_settings`, which is called in `scrapy/core/scraper.py` through `itemproc_cls.from_crawler(crawler)`. `.from_crawler()` returns an instance of a middleware class, initialized with a parameter from a crawler, in case of `DownloaderStats` it's a stats object (`cls(crawler.stats)`). It's another way of calling `DownloaderStats(crawler.stats)` (but the spider doesnt always know how to properly initialize all its middleware classes) – paul trmbrth Aug 27 '13 at 18:31
  • 1
    You're welcome @Charles S. I'll have to re-read your question for the 2nd part – paul trmbrth Aug 27 '13 at 20:42
  • 1
    Just posted an answer trying to help you understand the class method and how the DownloaderStats are instantiated. As for your second question, what do you mean that you're trying to "capture" all Twisted errors, and what do you mean that you'd rather send the request and the ex_class "back up to an item pipeline" – audiodude Aug 27 '13 at 22:59

1 Answers1

2

First of all, DownloaderStats(object) doesn't mean that DownloaderStats is being passed an object, it means that the DownloaderStats class extends the object class.

In your class method, cls is the class being called, in this case DownloaderStats. So the code cls(crawler.stats) could be thought of as DownloaderStats(crawler.stats), which instantiates an object of the class DownloaderStats. Instantiating objects in Python cause their __init__ method to be called, so the value of crawler.stats gets assigned to the stats parameter of the __init__ method, which then gets assigned to self.stats.

audiodude
  • 1,865
  • 16
  • 22