How to collect stats from within scrapy spider callback?

Question

How can I collect stats from within a spider callback?

Example

class MySpider(Spider):
     name = "myspider"
     start_urls = ["http://example.com"]

def parse(self, response):
    stats.set_value('foo', 'bar')

Not sure what to import or how to make stats available in general.

score 17 · Accepted Answer · edited May 23 '17 at 12:34

17

Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats to your spider code to be able to do stuff with it.

EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.

EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()!

edited May 23 '17 at 12:34

Community

1
1

answered Apr 09 '14 at 07:57

Talvalin

7,789
2
30
40

hmm. it returns ``ImportError: cannot import name crawler``. ``File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/stats.py", line 1, in from scrapy.project import crawler`` – mattes Apr 10 '14 at 02:02
That's odd. I take it that your basic spider works without error? – Talvalin Apr 10 '14 at 07:25
yep. it works as long as I don't do anything with ``stats``. here is an example of how my spider looks like: https://gist.github.com/mattes/10367042 – mattes Apr 10 '14 at 10:44
I've edited my answer above. You can just use `self.crawler.stats.set_value()` in the `parse` method. – Talvalin Apr 10 '14 at 11:12
How do you reference the stats that were collected in a crawl? – michaelAdam Jul 16 '15 at 20:45

score 3 · Answer 2 · answered Jun 10 '15 at 13:55

With scrapy 0.24 - stats I use it by the follow way:

class TopSearchesSpider(CrawlSpider):
    name = "topSearches"
    allowed_domains = ["...domain..."]

    start_urls = (
        'http://...domain...',
    )

    def __init__(self, stats):
        super(TopSearchesSpider, self).__init__()
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

    def parse_start_url(self, response):
        sel = Selector(response);
        url = response.url;

        self.stats.inc_value('pages_crawled')
        ...

super method is to call CrawlSpider constructor to execute its own code.

score 2 · Answer 3 · answered Feb 17 '17 at 14:15

2

Add this inside your spider class

def my_parse(self, response): 
    print self.crawler.stats.get_stats()

answered Feb 17 '17 at 14:15

Aminah Nuraini

18,120
8
90
108

Mouday peng · Answer 4 · 2018-12-14T07:52:14.250

1

if you want to use in other, you can:

spider.crawler.stats.get_stats()

edited Dec 14 '18 at 07:52

answered Nov 12 '18 at 11:11

Mouday peng

11
4

score 0 · Answer 5 · answered Jul 13 '22 at 13:08

if you want to get the scrapy stats after crawling as python object. This might help -

def spider_results(spider):
    results = []
    stats = []

    def crawler_results(signal, sender, item, response, spider):
        results.append(item)

    def crawler_stats(*args, **kwargs):
        stats.append(kwargs['sender'].stats.get_stats())

    dispatcher.connect(crawler_results, signal=signals.item_scraped)

    dispatcher.connect(crawler_stats, signal=signals.spider_closed)

    process = CrawlerProcess()
    process.crawl(spider) # put our own spider class here
    process.start()  # the script will block here until the crawling is finished
    return results, stats

Hope it helps!

How to collect stats from within scrapy spider callback?

5 Answers5

Linked