11

Framework Scrapy - Scrapyd server.

I have some problem with getting jobid value inside the spider.

After post data to http://localhost:6800/schedule.json the response is

status = ok
jobid = bc2096406b3011e1a2d0005056c00008

But I need use this jobid inside the current spider during the process. It can be used for open {jobid}.log file or other dynamic reasons.

class SomeSpider(BaseSpider):
    name = "some"
    start_urls = ["http://www.example.com/"]
    def parse(self, response):
        items = []
        for val in values:
            item = SomeItem()
            item['jobid'] = self.jobid # ???!
            items.append(item)
        return items

But I see this jobid only after the task is finihed :( Thanks!

lenriq
  • 361
  • 3
  • 11

3 Answers3

10

You can get it from the SCRAPY_JOB environment variable:

os.environ['SCRAPY_JOB']
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Hassan Raza
  • 3,025
  • 22
  • 35
6

I guess there is an easier way, but you can extract job id from command line args. IIRC, scrapyd launches a spider giving it a jobid in parameters. Just explore sys.args where you need jobid.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
1

In the spider.py -->

class SomeSpider(BaseSpider):
    name = "some"
    start_urls = ["http://www.example.com/"]

    def __init__(self, *args, **kwargs):
        super(SomeSpider, self).__init__(*args, **kwargs)
        self.jobid = kwargs.get('_job')

    def parse(self, response):
        items = []
        for val in values:
           item = SomeItem()
           item['jobid'] = self.jobid # ???!
           items.append(item)
        return items
Sadia
  • 91
  • 1
  • 4