0

I am trying to get the job id of a scrapy 2.1.x job on spider_close method:

 class mysql_pipeline(object):
    import os
    def test:
       print(os.environ['SCRAPY_JOB'])

Unfortunatelly this results in a key error:

 ERROR: Scraper close failure
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/andy/spider2/crawler/pipelines.py", line 137, in close_spider
    os.environ['SCRAPY_JOB'],
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'SCRAPY_JOB'
2020-05-16 17:24:52 [scrapy

How can I pull the job id within the method?

Gallaecio
  • 3,620
  • 2
  • 25
  • 64
merlin
  • 2,717
  • 3
  • 29
  • 59

1 Answers1

0

In the spider constructor(inside init), add the line -->

self.jobId = kwargs.get('_job')

then in the parse function pass this in item,

def parse(self, response):
    data = {}
    .............
    yield data['_job']

in the pipeline add this -->

def process_item(self, item, spider):
    self.jobId = item['jobId']
    .......

def close_spider(self, spider):
    print(self.jobId)
    ......
Sadia
  • 91
  • 1
  • 4