How to retrieve scrpy job id within method?

Question

I am trying to get the job id of a scrapy 2.1.x job on spider_close method:

 class mysql_pipeline(object):
    import os
    def test:
       print(os.environ['SCRAPY_JOB'])

Unfortunatelly this results in a key error:

 ERROR: Scraper close failure
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/andy/spider2/crawler/pipelines.py", line 137, in close_spider
    os.environ['SCRAPY_JOB'],
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'SCRAPY_JOB'
2020-05-16 17:24:52 [scrapy

How can I pull the job id within the method?

This seems about Scrapyd, not Scrapy. https://stackoverflow.com/q/9652456/939364 might help. — Gallaecio, May 19 '20 at 15:04

score 0 · Answer 1 · answered Apr 07 '21 at 06:59

In the spider constructor(inside init), add the line -->

self.jobId = kwargs.get('_job')

then in the parse function pass this in item,

def parse(self, response):
    data = {}
    .............
    yield data['_job']

in the pipeline add this -->

def process_item(self, item, spider):
    self.jobId = item['jobId']
    .......

def close_spider(self, spider):
    print(self.jobId)
    ......

How to retrieve scrpy job id within method?

1 Answers1